Homopolymer encoded nucleic acid memory

ABSTRACT

Nucleic acid memory strands encoding digital data using a sequence of a homopolymer tracts of repeated nucleotides provides a cheaper and faster alternative to conventional digital DNA storage techniques. The use of homopolymer tracts allows for lower fidelity, high throughput sequencing techniques such as nanopore sequencing to read data encoded in the memory strands. Specialized synthesis techniques allow for synthesis of long memory strands capable of encoding large volumes of data despite the reduced data density afforded by homopolymer tracts as compared to conventional single nucleotide sequences.

SEQUENCE LISTING

A “Sequence Listing XML” is submitted herewith in XML file format and(i) the name of the file is MOLA-004-03US-Seqs.xml; (ii) the date ofcreation is Mar. 27, 2023; and (iii) the size of the file is 17,095bytes and the material in the XML file is incorporated by reference.

FIELD OF THE INVENTION

The invention relates to methods and apparatus for storing data in anucleic acid memory strand comprising homopolymer tracts.

BACKGROUND

DNA digital storage is a process of representing digital data using thebase sequences of DNA and storing that data through DNA synthesis ofpolynucleotides corresponding to the base sequence encoding the data.DNA digital storage provides several advantages over conventional datastorage methods and targets a market in the tens of billions of dollars.Conventional data storage methods including flash memory and recordingon magnetic tapes pose problems related to physical space requirements,reliance on scarce resources, and data integrity. DNA digital storageprovides much greater data storage density with significantly lowerenergy requirements. Current methods rely on high-fidelity sequencingtechniques with little tolerance for errors in order to accurately readthe data encoded in the DNA. The required sequencing methods arerelatively slow and expensive to meet the fidelity requirements. Anexample of current DNA digital storage techniques is described in U.S.Pat. No. 9,384,320 to Church, et al. (incorporated herein by reference).In order to increase sequencing fidelity, current methods such as thosedescribed by Church encode data using sequences that avoid features thatare difficult to read or write such as sequence repeats.

The synthesis side of current DNA digital storage techniques furtherlimits adoption of the technology through a lack of speed, production oftoxic byproducts, and high costs. Most de novo nucleic acid sequencesare synthesized using solid phase phosphoramidite-techniques thatinvolve the sequential de-protection and synthesis of sequences builtfrom phosphoramidite reagents corresponding to natural (or non-natural)nucleic acid bases. While inkjet synthesis on array-based formats iscapable of very low cost phosphoramidite synthesis, the strands that aremade are limited to 100-200 bases in length, must sacrifice some of thelength to index sequences, and are made in sub-femtomolar scalerequiring post-synthesis amplification to provide sufficient materialfor subsequent read-out. Using conventional synthesis techniques,nucleic acids greater than 200 base pairs (bp) in length experience highrates of breakage and side reactions. Additionally, conventionalsynthesis techniques produce toxic by-products, and the disposal of thiswaste limits the availability of nucleic acid synthesizers and increasesthe costs of oligo production. These complications related to synthesisand read-out in DNA digital storage have limited the applications for anotherwise promising technology.

SUMMARY

The invention provides systems and methods for storing data usingsequences of homopolymer tracts encoding digital data. Representing eachbit in the data sequence using a homopolymer tract of repeated bases(e.g., 2-10 nucleotides) allows for higher throughput and less expensivesequencing techniques to be used. Because the sequence read relies onlyon discriminating the transition between homopolymer tracts and does notrequire a faithful read of each individual nucleotide, sequencingtechniques such as nanopore sequencing, zero-mode waveguide (ZMW) singlemolecule sequencing, and mass spectrometry may be used to increase speedand reduce cost.

Recording of data using homopolymer tracts as described herein is mostefficiently accomplished using long strands (e.g., 5-10 kb) of nucleicacid. While traditional synthesis techniques are length limited,template-independent polynucleotide synthesis of using, for example, anucleotidyl transferase are capable of synthesizing long strands atreduced costs and with lower waste production. Enzymatically synthesizedssDNA memory strands only require 50% of the DNA synthesis compared toconventional phosphoramidite approaches because ssDNA strands longerthan about 100-200 nucleotides in length require complex and costlyligation or PCR techniques and can only produce ssDNA from dsDNAintermediates. See, U.S. Pat. No. 8,808,989 to Efcavitch, et al.,incorporated herein by reference. Data encoding can be in numerical base2, 3, 4 using standard nucleotides or data density can be increasedusing any number of modified nucleotide analogs to generate base 8, 10,12, or more encoding schemes.

The limitations on modified nucleotide analogs are only that they can beincorporated using the chosen synthesis technique (e.g., terminaldeoxynucleotidyl transferase (TdT)) and can be differentiated from oneanother using the chosen sequencing analysis. In some embodiments,synthesis may be accomplished using polymerase theta in the presence ofMn²⁺.

Consistent homopolymer tract length is not essential to the systems andmethods of the invention because it is only the transition betweenindividual tracts that needs to be recognized. Even though the tractlength can be allowed to vary, synthesis techniques of the invention caneffectively control the average homopolymer tract length by adjustingthe ratio of deoxynucleotides (dNTPs) to the oligonucleotide memorystrands being synthesized and controlling the exposure time of the dNTPsto the nascent memory strand. The length of the homopolymer tracts canbe optimized to the readout technology; the highest data storage densityis achieved with single nucleotide readout resolution, but the highestreadout speed and accuracy are achievable by expanding the size of thenucleotide bit to the minimum detectable length (e.g., 2-10 nucleotides)for a given sequencing technology.

Systems and methods of the invention that use nanopore sequencing mayuse specialized memory strand constructs such as stoppers (e.g.,hairpins or macromolecular appendages) included on one or both ends ofthe strand. In other nanopore-based methods, the memory strand may becircularized and threaded between adjacent nanopores.

Certain aspects of the invention include a method of recording datausing a nucleic acid memory strand. Steps of the method may includecreating an in-silico oligonucleotide sequence that represents a datasetwhere each nucleotide of the oligonucleotide sequence corresponds to aunit of said dataset. A nucleic acid memory strand can then besynthesized comprising a plurality of homopolymer tracts where eachhomopolymer tract corresponds to a nucleotide of the oligonucleotidesequence. The plurality of homopolymer tracts may include between 3 and10 repeated nucleotides. Each unit of said dataset can be represented inbase 2, base 3, base 4, or higher as desired for a particularapplication.

In certain embodiments, the nucleic acid memory strand may be from atleast about 200 nucleotides in length to about 5,000 nucleotides inlength. The synthesizing step may include controlling homopolymer lengthby varying dNTP concentration. Steps of the method may include modifyinga first end of the nucleic acid memory strand to prevent passage of thefirst end through a nanopore of a nanopore sequencing system; passing asecond end of the nucleic acid memory strand through the nanopore; andmodifying the second end of the nucleic acid memory strand to preventpassage of the second end through the nanopore.

Other embodiments may utilize a memory strand be comprised ofheteropolymer tracts of a defined stoichiometry or composition tofurther increase the coding capacity of a set number of nucleotideanalogs. Further embodiments may seek to protect the data encoded in amemory strand by using nucleotide analogs that are similar in structurebut employ linkers that can be removed under different conditions suchas ultraviolet or visible light, oxidizing or reducing agents, alkalineor acidic pH, or sequence specific nucleases, thereby disguising thedata to those without knowledge of the applicable process.

The dataset may be selected from the group consisting of a text file, animage file, and an audio file. The synthesizing step may includetemplate-independent synthesis. In certain embodiments, a nucleotidyltransferase enzyme may be used to catalyze said template-independentsynthesis. Polymerase theta can be used to catalyze saidtemplate-independent synthesis in some embodiments.

Aspects of the invention may include a method of reading data from anucleic acid memory strand. Steps of the method can include sequencing anucleic acid memory strand comprising a plurality of homopolymer tracts;converting the nucleic acid memory strand sequence into digitized data,wherein each of the plurality of homopolymer tracts represents anucleotide corresponding to a unit of data; and converting the digitizedpiece of data to a readable format. Steps of the method may includedisplaying the readable format. The plurality of homopolymer tracts mayinclude between about 2 and about 10 nucleotide repeats. The nucleicacid memory strand may be between at least about 200 nucleotides andabout 5,000 nucleotides in length.

In various embodiments, the sequencing step can comprise nanoporesequencing, sequencing by synthesis, or mass spectrometry. Thesequencing, translating, and converting steps may be repeated one ormore times on the nucleic acid memory strand.

Other aspects of the invention are apparent to the skilled artisan uponconsideration of the following figures and detailed description.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 shows an enzymatic synthesis cycle used to form homopolymertracts.

FIG. 2 shows a method of synthesizing a nucleic acid memory strand withhomopolymer tracts.

FIG. 3 shows a method of reading data from a nucleic acid memory strandwith homopolymer tracts.

FIG. 4 shows the relationship of strands/GB to homopolymer length andstrand length.

FIG. 5 shows the data that can be encoded in a DNA strand comprised of500 polymer tracts as a function of the number of distinguishablenucleotide analogs in a memory strand.

FIG. 6 shows a nanopore trapped nucleic acid memory strand of theinvention.

FIG. 7 shows a scheme for changing the data encoded within a strand ofhomopolymer tracts in response to treatment conditions.

FIG. 8 shows a system for synthesizing nucleic acid memory strands withhomopolymer tracts.

FIG. 9 shows a system for the parallel synthesis of nucleic acid memorystrands with homopolymer tracts on an array of nanowells.

FIG. 10 gives a more detailed schematic of components that may appearwithin a system.

FIG. 11 shows analysis of the enzymatically-mediated synthesis of twodifferent base composition homopolymer tracts.

FIG. 12 shows the process for converting character text into a 12-bitnucleic acid sequence.

FIG. 13 shows polyacrylamide gel electrophoresis (PAGE) analysis oftwelve cycles of enzymatic synthesis.

FIG. 14A-L shows the experimentally found homopolymer distribution foreach of the 12-bit homopolymer additions.

FIG. 15 illustrates cycle steps for the template independent DNApolymerase mediated synthesis of homopolymer encoded informationpolymers.

FIG. 16 shows a side view of a typical reaction zone, in a 2D array ofreaction zones, showing hydrophobic patterning isolating a liquidreaction droplet.

FIG. 17 shows a side view of a reaction zone with a controlled heaterelement or electrochemical element.

FIG. 18 shows a reaction zone showing a light transparent top cover.

FIG. 19 shows an exemplary apparatus useful for the addressable lightactivation of enzymatic DNA synthesis.

FIG. 20 shows kinetics of UV light decaging of 3′-O-(2-nitro)-benzyldATP.

FIG. 21 shows optically controlled oligonucleotide extension. Anenzymatic reaction mixture containing 3′-orthonitrobenzyl dATP wasirradiated with a low-power light source for various intervals so thatthe exposure time controlled the amount of decaging. The amount ofusable natural dATP formed in turned controlled the average length ofthe polynucleotide tract.

FIG. 22 shows structures of 3′-O-caging modifications allowing two-colorlight-mediated decaging.

FIG. 23 shows gel electrophoresis analysis of each of 12 cycles ofhomopolymer synthesis using 3′-O-(2-nitro)-benzyl dATP and dCTP, dGTP,dTTP.

FIG. 24 shows electrophoretic analysis of enzymatic synthesis reactionscomparing incorporation modulating dNTP analogs versus unmodified dNTP.

FIG. 25 shows electrophoretic analysis of two consecutive cycles ofincorporation rate modulating dNTP analogs. Modified analogs were usedto form N+1 homopolymers, then followed by N+2 homopolymer additions.Homopolymer synthesis rate modulating modifications were removed fromoligonucleotide prior to gel analysis.

FIG. 26 shows an electropherogram of TdT extension reactions showinghomopolymer rate modulating analog incorporation followed by secondcycle of homopolymer rate modulating analog incorporation. Modifiednucleotide followed by consecutive modified nucleotide.

FIG. 27 shows a modified dNTP analog used for information polymercomposition.

FIG. 28 shows electrophoretic analysis of an information polymercomposed of modified nucleotides.

FIG. 29 shows detection of an information polymer (P71) composed ofmodified nucleotides by translocation through a nanopore.

FIG. 30 shows detection of an information polymer (86B) composed ofmodified nucleotides by translocation through a nanopore.

DETAILED DESCRIPTION

The invention provides systems and methods for writing data to andreading data from nucleic acids having homopolymer tracts correspondingto units of digital data. By repeating (e.g., 3-10 times) eachnucleotide in the data-encoding sequence, only the transition betweenhomopolymer tracts needs to be observed in sequencing reads allowing forlower fidelity, higher throughput sequencing techniques that can resultin cheaper execution in nucleic acid data storage. Advantages ofsynthesizing nucleic acid homopolymer tract memory strands are: 1) theability to make very long (5-10 kb) strands, which enables the use ofhigh throughput, long read DNA sequencing technologies for readout, 2)the ability to tolerate errors in sequencing readout technologies and 3)the ability to make nucleic acid memory strands with costs far less thanthat of conventional chemical synthesis methods. The use of homopolymernucleic acid memory strands is best realized in long (e.g., 5-10 kb)strands that can be efficiently produced using template-independent TdTenzymes or polymerase theta wherein homopolymer tract length can becontrolled by altering exposure time and dNTP to polynucleotide memorystrand ratio.

The synthesis of homopolymers for encoding data by an enzymaticallymediated approach is easily achieved by using natural or modifiednucleotide triphosphates that are not terminators, resulting in thesimplest and most rapid method of DNA synthesis possible. One natural ormodified nucleotide triphosphate is delivered to a reaction zone with anucleotidyl transferase, allowed to react and then removed by washingwith a buffer thus completing one “write” cycle of data storage asillustrated in FIG. 1 . The data strand synthesis occurs in entirelyaqueous environment, with no toxic or hazardous chemicals, thus enablingpractical devices suitable for large scale data storage centers.

FIG. 2 shows a method 101 of synthesizing a nucleic acid memory strandwith homopolymer tracts according to certain embodiments. The method 101includes creating 103 an in-silico oligonucleotide sequence representinga dataset. The dataset may comprise digitized data that may representtext, an image, a video, an audio, or any other piece of informationthat may be digitized. The oligonucleotide sequence may comprise anynumber of natural or modified nucleotides or analogs thereof and mayencode the dataset using a base 2, base 3, base 4, or greater schemedepending on the number of unique nucleotides or analogs used in thememory strand. In a simple embodiment, the encoding scheme maycorrespond to a binary data scheme conventionally represented by aseries of 0s and 1s where one or more nucleotides or analogs maycorrespond to 0s and one or more other nucleotides may correspond to 1s.A nucleic acid memory strand (e.g., RNA, single-stranded, ordouble-stranded DNA) comprising a series of homopolymer tracts eachcorresponding to a nucleotide, in order, in the in silicooligonucleotide sequence can then be synthesized 105. In certainembodiments, further steps include modifying 107 one end of the memorystrand, threading 109 the memory strand through a nanopore and modifying111 the other end of the strand to prevent the ends from passing throughthe nanopore.

FIG. 3 shows a method 203 of reading data from a nucleic acid memorystrand with homopolymer tracts. Steps of the method 203 includesequencing 203 a series of homopolymer tracts in a nucleic acid memorystrand, converting 205 the sequence to a dataset, converting the datasetinto a readable format (e.g., an image, a video, an audio clip, or apiece of text), and optionally displaying 209 that readable format ofdata (e.g., on a monitor or using a printer or other input/outputdevice).

Preferably, systems and methods of the invention use long strands of DNA(5-10 kb) that are either single stranded or double stranded and may benaturally occurring or generated by chemical or enzymatic synthesis. Incertain embodiments, the nucleic acid memory strands may be generatedenzymatically using TdT to create a series of homopolymer tracts thatmay be 2-10, 3-10, 4-10 nucleotides or longer. The homopolymer tractscan each consist of Adenine (A), Guanine (G), Cytosine (C), or Thymine(T). The sequence of alternating homopolymer tracts can be used toencode the data that is to be stored in the memory strand.

Each nucleotide homopolymer tract can represent various amounts of datadepending on the number of bases used. The number of bits required tomake one byte (decimal 256) is defined by the following relationship:#bits/byte=8/(log 2(n)), where n=the numerical base that is used. Eachtract can correspond to one bit if two base encoding is used or ¼ of abyte if four base encoding is used. In certain embodiments, DNA datastrands may be composed of 2-10 nucleotide homopolymer tracts (usingabase 2 dataset representation), which would allow 333 bits to 100 bitsto be represented in a memory strand between 999 and 1000 bases long. Inpreferred embodiments, nucleic acid base encoding of data may be suchthat single homopolymer tracts of one nucleotide are always adjacent tohomopolymer tract of a different nucleotide. For example, encoding maybe such that a homopolymer tract of Adenine would not be immediatelypreceded or followed by another Adenine homopolymer tract. In the casewhere two adjacent homopolymer tracts comprise the same nucleotide, ahomopolymer tract may be synthesized that is measurably longer than theaverage homopolymer tracts representing single nucleotides in theencoded data sequence. Those longer tracts may be created throughmanipulation of the synthesis reactions described below by, for example,increasing the concentration of dNTPs in the reaction or increasing thereaction time. The exact length of two adjacent identical homopolymertracts need only be long enough to be unambiguously distinguished fromsingle homopolymer tracts using the readout device (i.e., nanoporesequencer). In certain embodiments, a non-nucleotide homopolymer spacercould be added between A, G, C, or T homopolymer tracts to clearlydistinguish adjacent same nucleotide homopolymer tracts from oneanother. The use of A, G, C, & T homopolymer tracts enables the creationof a four (4) bit encoding space increasing the density of data that canbe stored in one contiguous strand rather than simply using twonucleotides (similar to 0s and 1s in binary code). For example, fourcontiguous homopolymer tracts can encode 256 digits (i.e., one byte) ifA, G, C, & T are used in a base 4 scheme. In such an embodiment, therewould be 83 bytes or 25 bytes represented in a 996 or 1000 nucleotidelong nucleic acid memory strand if three (3) nucleotide long homopolymertracts or ten (10) nucleotide long homopolymer tracts were usedrespectively.

In various embodiments, base 8 or even base 12 coding schemes may beemployed through the incorporation into the memory strand of homopolymertracts of uniquely modified nucleotide or non-nucleotide analogs. Thosemodified nucleotide or non-nucleotide analogs should generate a uniquedigital signal with a readout device like a nanopore sequencer or asingle molecule ZMW sequencer. TdT, as discussed below, can incorporatea wide range of modified dNTP analogs that can enhance the signalprovided by a readout device like a nanopore and thus may be used forgenerating nucleic acid memory strands with data encoded in them.Homopolymers of modified nucleotides (e.g., A*, G*, C* & T* or A**, G**,C** & T**) can be synthesized using TdT and modified dNTP analogs (e.g.,dA*TP or dA**TP) of each of the four bases to generate an eight (8) bitor even a twelve (12) bit encoding scheme. Higher base (n) encodingallows for data compression and results in a reduction in the number ofDNA strands that are required to encode a given amount of information.The relationship determining the number of DNA strands per GB of data asa function of the length of the homopolymer tract, the numerical base(n), and the strand length synthesized is defined by:#strands/GB=(8/(log 2(n))*10⁹*homopolymer length*1/strand length asillustrated in FIG. 4 .

The number of unique homopolymer tracts may be limited only by theability of the readout technology (i.e., nanopore or ZMW single moleculesequencing) to determine one homopolymer tract from another. There areseveral reports in the literature of the detection of homopolymerscomposed of unmodified nucleotides by detecting the change in the ioniccurrent during translocation through nanopores (Venta et al, 2013; Fenget al, 2015). Modifications that alter the dwell time of the DNA in thenanopore will generate a distinguishable and characteristicionic-current signal. Singer et al 2010 and Morin et al 2016 usenon-covalently bound bisPNA or γPNA functionalized with 5 or 10 kDa PEGto enhance detection by nanopores. Liu et al 2015 selectively createdadamantly 8-oxoG analogs to modify the dwell time and generate a uniquesignal. Given the tolerance of TdT to incorporating bulky modificationsat N6 of dATP, N4 of dCTP, N2 or O6 of dGTP and O4 or N3 of dTTP, acylor alky modifications at those positions may be screened and chosen toenhance the detection modality of nanopore or ZMW single moleculesequencing technologies. Detection may be improved through modifiednucleotides that enhance the differential current blockade in a nanoporeor enhance the dwell time of a modified nucleotide in the active site ofa DNA polymerase in a ZMW single molecule approach. Other natural andnon-natural purine and pyrimidine nucleotide analogs may be used if theygenerate a unique digital signal with a readout device like a nanoporesequencer or a single molecule ZMW sequencer. Modifications at the C5 orC7 of pyrimidines and purines respectively may be used if they generatea unique digital signal with a readout device like a nanopore sequenceror a single molecule ZMW sequencer. Suitable modified nucleotidetriphosphates are chosen to be rapidly incorporated during the enzymaticextension step and provide a substitution-specific dwell time with asshort of a homopolymer as possible during the detection step. Examplesof modifications to A, G, C, & T bases suitable for expanding the bitencoding space may include but are not limited to N6-benzoyl dA,N6-benzyl dA, N6-alkyl dA, N6-acyl dA, N6-substituted alkyl dA,N6-substituted acyl dA, N6-aryl acyl dA, N6-substituted aryl acyl dA,N2-alkyl-dG, N2-acyl dG, N2-aryl acyl dG, N2-substituted alkyl dG,N2-substituted acyl dG, N2-substituted aryl acyl dG, O6 alkyl dG, O4alkyl dT, N3 alkyl dT, N3 acyl dT, O6 substituted alkyl dG, O4substituted alkyl dT, C5-propargyl amine dT, C5-propargyl amine dC,C7-propargyl amine dA, C7-propargyl amine dG, substituted C5-propargylamine dT, substituted C5-propargyl amine dC, substituted C7-propargylamine dA, substituted C7-propargyl amine dG. Preferred embodiments ofsubstitutions include but are not limited to covalent attachments thatare completely stable to removal except under the most extreme chemicalconditions of pH, temperature and concentration of reactive species.Substitutions that are able to affect a unique current blockade mayinclude but are not limited to alkyl, heteroatom substituted alkyl,aromatic hydrocarbons, alkyl substituted aromatic hydrocarbons,heteroatom substituted alkyl substituted aromatic hydrocarbons,heteroatom substituted aromatic hydrocarbons, benzyl, substituted benzylor combinations of the such. In some embodiments, the substitutions canbe polyethylene glycols composed of 2 to 450 monomer units. In someembodiments, substitutions comprised of peptides or peptoids may besuitable to increase the dwell time of homopolymers in a specific anddiscernable manner. The efficiency of incorporation of modifiednucleotides by template-independent polymerases like TdT may bemodulated by the use of different metal ion cofactors such as but notlimited to Co++, Zn++, Mg++, Mn++, or mixtures of two or more differentmetal ions. Each modified nucleotide may require a different metal ionfor optimal performance during enzymatic homopolymer synthesis.

Since long term stability of the DNA data strands is essential, there isa distinct advantage to using non-purine based homopolymers since theyare subject to depurination at low pH. In some embodiments, thehomopolymer bits can be composed of only a single nucleotide type (i.e.,Thymine) that is modified with two, three, four or more differentchemical groups resulting in homopolymer tracts that each result in aunique current blockade. Thus, one nucleotide labeled with four uniquemodifiers can substitute for the presence of A, G, C, T. Otherembodiments that use only one of the other three nucleotides with two,three, four or more different chemical groups are possible.

If the modified nucleotide analogs are sufficient enough to cause aunique dwell time for the passage of a single nucleotide through ananopore, another embodiment would use single nucleotide bits instead ofhomopolymer bits. Single modified nucleotide bits would be advantageousin allowing the maximum density of information per DNA strand thusreducing the cost of DNA based data storage.

The precise length of the homopolymer tract is not critical so long asthe sequencing technology used for the read-out can clearly distinguishthe start and stop of one homopolymer tract from another. Although thereare distinct synthesis and storage density advantages to increasing thenumber of unique nucleotides or bases (including modified nucleotide ornon-nucleotide analogs) used and therefore reducing the length needed tocapture a set amount of data, the lowest cost per DNA data storagesynthesis may be achieved by using four natural nucleotide dNTP monomersduring enzymatic synthesis since those reagents are widely used in themolecular biology & sequencing fields and are produced in very largebatches with the lowest manufacturing cost. The cost of production ofdNTP analogs to increase the number of unique homopolymer tracts may bereduced as the use of DNA data storage increases and manufacturing scaleof analogs is also increased.

Any method of synthesizing the homopolymer tract segment may be usedwith systems and methods of the invention but preferred embodiments usethe template-independent enzyme TdT. TdT provides certain benefitsinsofar as it will rapidly and inexpensively generate a homopolymer witha Poisson distribution where the average size of the homopolymer may bestrictly controlled by the ratio of the [dNTP] to the nascentoligonucleotide memory strand. In some embodiments, polymerase theta inthe presence of Mn²⁺ can be used as a template-independent polymerase tosynthesize homopolymer tract nucleic acid memory strands. In anotherembodiment, the length of the homopolymer tract segments can becontrolled by delivering an excess of dNTP to a reaction zone and thenremoving the reactants after carefully controlled interval of time.

TdT has demonstrated the ability to synthesize homopolymer tracts of afairly defined length by controlling the ratio of dNTP concentration tothe concentration of 3′-ends of the nucleic acid strand desired to bemodified. Inkjet synthesis on array-based formats is capable of very lowcost phosphoramidite synthesis but the strands that are made are limitedto 100-200 bases in length, must sacrifice some of the length to indexsequences, are made in sub-femtomolar scale requiring post-synthesisamplification to provide sufficient material for subsequent read-out andare mostly suited for relatively inefficient short read sequencingreadout technologies.

Strands of single stranded DNA synthesized according to processes of theinvention may benefit from the prevention of hairpins or dsDNA eitherduring the synthesis or during the readout. Hairpin formation can beprevented by modifying the exocyclic amines of one member of a A:T orG:C base pair to prevent the hydrogen bonding necessary for basepairing. In some embodiments, the exocyclic amines may be modified byacylation or alkylation. Any simple and stable modification of theexocyclic amines of A, G or C, which prevents base pairing can be usedto prevent hairpin formation. In certain embodiments, the N6 ofdeoxyadenosine and the N2 of deoxyguanosine may be acetylated with anacetyl group preventing base pairing. In some embodiments, the N6 ofdeoxyadenosine and the N4 of deoxycytidine can be modified to preventbase pairing and hairpin formation. In some embodiments, the O6 ofdeoxyguanosine or the O6 of deoxythymidine can be modified to preventbase pairing and hairpin formation. In some embodiments, the O4 ofdeoxythymidine or the N3 of deoxythymidine can be modified to preventbase pairing and hairpin formation. In some embodiments, modificationsto A, G, C, or T to generate higher order base encoding schemes alsoserve the purpose of preventing base pairing and hairpin formation. Insome embodiments, homopolymer bits can be composed of only a singlenucleotide type (i.e., Thymine) that is modified with two, three, fouror more different chemical groups that result in a unique currentblockade and prevent the formation of intra- or inter-strand doublestrand regions. In another embodiment, a thermostable version of TdT oranother template-independent nucleotidyl transferase can be used toperform strand synthesis at an elevated temperature, thus preventingprevent the formation of intra- or inter-strand double strand regions.

Control of the homopolymer tract length can be optimized for any analogsas described above after determination and calibration of theincorporation rate of the dNTP analog to create a reproducible range ofhomopolymer tract lengths of 2-10 nucleotides in length. The use of A*,G*, C*, & T* and A**, G**, C** & T** homopolymer tracts allows thecreation of an eight (8) bit or twelve (12) bit encoding, increasing thedensity of data that can be stored in one contiguous strand rather thansimply using two nucleotides to encode for a “0” and a “1”. Three (3)contiguous homopolymer tracts can encode 256 digits (i.e., one byte) ifA, G, C, T, A*, G*, C*, & T* are used. In such embodiments, there wouldbe 111 bytes or 33 bytes in a 999 or 990 nucleotide long nucleic acidmemory strand if three (3) nucleotide long homopolymer tracts or ten(10) nucleotide long homopolymer tracts were used respectively. Two (2)contiguous homopolymer tracts can encode 256 digits (i.e., one byte) ifA, G, C, T, A*, G*, C*, T*, A**, G**, C**, & T**are used. In thoseembodiments, there would be 166 bytes or 50 bytes in a 996 or 1000nucleotide long nucleic acid memory strand if three (3) nucleotide longhomopolymer tracts or ten (10) nucleotide long homopolymer tracts wereused respectively.

In certain embodiments, data may also be encoded in the memory strandheteropolymer tracts of random sequence and defined composition toachieve a higher level of data compression. The heteropolymer stretchescan be generated with enzymatic reactions using mixtures of differentdNTPs, where the dNTP stoichiometry is used to control the compositionof the heteropolymer tracts. The number and type of heteropolymer tractsis limited only by the combinations of dNTP analogs and the ability ofthe detection modality to distinguish the compositions of the differenttracts. For m dNTP analogs, there are (m²-m)/2 binary combinations forheteropolymer formation. Detection modalities which can distinguish twodifferent levels of tract compositions for each binary combination (e.g.a tract where analogs A and B are present in an approximately 2:1 ratiorespectively and a tract where they are present in a 1:2 ratio) allowdata to be encoded at a rate of base m² from a set of m analogs,effectively doubling the coding capacity of the memory strands. FIG. 5illustrates the data that can be stored in a memory strand as a functionof the number of available dNTP analogs using either homopolymer and abinary heteropolymer-based encoding scheme with two levels tractcomposition.

Data encoding strands of the invention may not necessarily requireprecisely defined homopolymer lengths since they only need to be longenough (ca 2-10 nucleotides) to allow the unambiguous discrimination ofthe transition between homopolymer tract segments by a high throughputDNA sequencing technology. Existing Nest GenerationSequencing-by-Synthesis (SBS) systems can readily determine thetransition between two adjacent homopolymer tracts. Again, the preciselength of the homopolymer tract is not important to the accuratedetection of a homopolymer bit. The use of tracts of the same nucleotideoffers advantages in overcoming the most common errors in current SBSplatforms: insertions and deletions. The deletion of one nucleotide in ahomopolymer tract >2 nt will still be interpreted as a true homopolymer.Likewise, the insertion of a single nucleotide in a homopolymer tractwould not be falsely interpreted as two adjacent homopolymers since theinsertion of more than one nucleotide during SBS is an unlikely event.This sequencing error tolerance offers the advantage of decreasing thesequencing depth required to ensure correct decoding of the informationstored by the DNA data strand. Existing nanopore systems can easilydistinguish homopolymer tracts of A, G, C, or T from each other based ontheir differential current blockade. In certain embodiments, singlemolecule ZMW sequencing can be used to determine the linear order ofhomopolymer tracts on a linear strand. The use of either sequencingtechnology may require that the DNA initiator has properties that arecompatible with the sequencing readout technology like aself-complimentary hairpin at the 5′-end of a synthesized singlestranded memory strand to provide a primer for single molecule ZMWsequencing. Nanopore sequencing technology may also require aself-complimentary hairpin at the 5′-end of the strand to provide a“start” data mark. In various embodiments, the readout technology may beany next generation sequencing method such as that offered by Illumina(San Diego, Calif.). In some embodiments, the readout or sequencingtechnology may be Mass Spectrometry based. The technology specific errorrate of the read out technology is not important so long as it canunambiguously detect the transition between two different homopolymertract and/or unambiguously detect the difference between one homopolymertract length and one 2× in length in the case where two identicalhomopolymer tracts are adjacent to each other.

Certain read-out technologies may be preferable to others based on thespecific application of the invention. Technologies such as nanoporesequencing can be non-destructive and leave the nucleic acid memorystrand intact, suitable for multiple read-out cycles. Read-outtechnologies that are dependent on Sequencing by Synthesis (SBS) likeZMW single molecule and others, generate a copy of the original templatestrand and may require post-readout manipulation (i.e., strandseparation by melting) to remove the complimentary strand and return theoriginal nucleic acid memory strand to its pristine state ready for asubsequent cycle of readout. Other readout technologies, like massspectrometry, are destructive and would deplete the pool of nucleic acidmemory strands after repeated cycles of sampling and readout.

In various embodiments nucleic acid memory strands may include“stoppers”. “Stoppers” may be macromolecular constructs which preventthe passage of a single stranded or double stranded nucleic acid througha nanopore (Manrao, et al., 2012, Reading DNA at single-nucleotideresolution with a mutant MspA nanopore and phi29 DNA polymerase, NatureBiotechnology 30, 349-353, incorporated herein by reference). Proteinslike phi29 DNA polymerase are large enough to not be drawn through thelarger pore (about 6.3 nm) on the cis side of a protein nanopore. Thepore diameter of the smaller side is estimated to be about 1.2 nm wide.Stoppers can be used either on the 5′- or 3′-end of a nucleic acidmemory strand of the invention. In some applications, it may bedesirable to have a stopper at both the 3′- or 5′-end of a nucleic acidmolecule. Stoppers can consist of a hairpin (stem-loop) structure witheither a protruding 5′- or 3′-overhang to which the information encodingnucleic acid is covalently attached. If the stopper consists of ahairpin, the length of the ds stem may be sufficiently long to resistany melting force exerted on it by the electric field used totranslocate the memory strand through the nanopore. In certainembodiments, one base of the double-stranded stem region can becrosslinked to its cognate base that forms that base-pair so that it isimpossible for the double-stranded stem portion of the hairpin to meltunder the influence of the force exerted on it by the electric fieldthat translocates the rest of the molecule through a nanopore. Toutilize a hairpin stopper for TdT mediated nucleic acid memory synthesisaccording to certain embodiments, the stopper may have a 3′-overhang ofsufficient length (i.e., >10 nucleotides) to allow the binding of TdTfor template independent synthesis.

Stoppers may consist of a non-nucleotide macromolecular construct whichcan be appended to either the 3′- or 5′ end of a nucleic acid molecule.The construct can be synthesized by direct conjugation of amacromolecular species onto the 3′-end of a nucleic acid by a polymeraseor transferase like TdT (Sorensen, et al., 2013, Enzymatic Ligation ofLarge Biomolecules to DNA, ACS Nano, 7(9):8098-8104, incorporated hereinby reference) or by the incorporation of a functionalized nucleotidewhich allows the specific modification of the nucleic acid via thatfunctionality (Winz, et al., 2015, Nucleotidyl transferase assisted DNAlabeling with different click chemistries, Nucleic Acids Res.43(17):e110, incorporated herein by reference). 5′-end stoppers may bereadily introduced at the time of chemical synthesis of anoligonucleotide adapter either via direct synthesis of a hairpin or viasecondary modification of a functional handle introduced as the laststep of the 3′ to 5′ oligonucleotide synthesis and may be used as aninitiator. Alternatively, 5′-end stoppers can be constructed byattaching an oligonucleotide initiator via the 5′-end to a magnetic ornon-magnetic bead or particle or nanoparticle, enzymaticallysynthesizing the homopolymer tract containing memory strand and thenleaving the memory strand attached to the magnetic or non-magnetic beador particle or nanoparticle.

Stoppers can be further modified to allow cleavage of the stopper fromthe rest of the molecule to allow the nucleic acid strand to eitherpassively diffuse out of the nanopore or to be translocated out of thenanopore through the application of a voltage thus allowing strand to berecovered.

In certain embodiments, template independent polymerases or transferasescan be used to modify pre-synthesized strands of nucleic acids to enablethe use of nanopore devices as “Write Once, Read Many” types of memorydevices. Part of the inherent issues associated with the use of nanoporedevices as DNA sequencers is the high error rate they produce because ofthe poor discrimination of the nanopore. This may be due to the speed oftranslocation through the pore or the fact that the approximate depth ofthe nanopore is 8 nm, allowing for multiple bases to be present in thepore at the same time. The homopolymer memory strands of the inventionaddress this issue through the use of homopolymer repeats, decreasingthe need for strict sequencing accuracy. In certain embodiments, theshortcomings of nanopore sequencing may be addressed by implementing ahairpin adapter to one end of a double stranded DNA memory strand suchthat during the translocation and base calling process, each sense ofthe DNA memory strand could be read such that reading an individual baseand its complementary strand could compensate for the error rate ofreading each base only once. In certain embodiments, nanopore sequencingfidelity may be increased by appropriate modification of each end (5′-&3′-) of a single-stranded or double-stranded nucleic acid molecule witha bulky appendage that will not translocate across a pore (e.g., proteinor solid state). The molecule may then be trapped within a pore andtranslocated forwards and backwards many times to allow multiple readsof the same molecule in the same pore thus reducing the sequencing errorrate by the square of the number of reads (if the sequencing read errorsare due to stochastic origins).

Transferases like TdT may be used to append large and bulky modifiednucleotide analogs to the 3′-end of a DNA molecule. In certainembodiments, a WORM nanopore memory device may be generated using thefollowing steps: (1) generating a single molecule of DNA encodingspecific information in any high density encoding scheme as discussedabove and covalently modifying the 5′-end with a bulky molecularconstruct that prevents complete translocation of the DNA moleculethrough a nanopore; (2) threading the DNA molecule through the nanoporeuntil the 5′-modified end is in contact with the nanopore and it cannottranslocate any further; (3) using TdT and a modified nucleotide tocovalently add one (or more) bulky nucleotide analogs (“stoppers”) tothe 3-end of the DNA molecule to effectively trap the molecule withinthe torus of the nanopore; (4) reversing the polarity of the current tothe nanopores to clear out any DNA molecules that are not 3′-modifiedthus creating a pure population of “trapped” (5′-& 3′-modified) nucleicacid strands; (5) removing any un-trapped nucleic acids from thevicinity of the nanopore through washing or other means; (6) reading the“trapped” DNA strand in either or both directions, using an appliedvoltage, (potentially reading multiple times to reduce the error rate toan acceptable level). In various embodiments, step 6 may consist of avoltage induced “read” in one direction, and rapid translocation in theopposite direction to “rewind” the data encoding nucleic acid throughthe nanopore followed by another voltage induced “read” in the originaldirection. This cycle of “read”—“rewind”—“read” can be repeated as manytimes as desired.

In some embodiments, the trapped nucleic acid strand may be read duringtranslocation in either direction. In some embodiments, the trappedstrand can be translocated to one end of the molecule (either 5′- or3′-) and read in the opposite direction as such a polarity of readingmay provide a higher accuracy read.

In certain embodiments, a circularized nucleic acid memory strand may begenerated using synthesis methods described above followed bycircularization. The circularized strand may comprise a bulkymacromolecule or specific homopolymer sequence where the ends of thesynthesized strand were joined in order to designate a start and stoppoint for data reading. Start and stop homopolymer sequences may also beused in linear nucleic acid strands. The circularized strand 305 mayhave been threaded between two adjacent nanopores (306 and 309) suchthat the circular strand 305 is physically trapped between the twonanopores (306 and 309) located on a single membrane 303 as shown inFIG. 6 . The circularized memory strand may encode digital informationas either sequences of single nucleotides, homopolymer tract sequences,sequences of modified nucleotide analogs, or some combination thereof.One nanopore 309 may be used to generate an electrical signal as theinformation-encoded memory strand is translocated though the pore, whilethe other nanopore 307 may simply act as a portal to allow the DNAmolecule to return to the cis side of membrane 303 and first nanopore309. One advantage of this scheme is that the information encodingstrand may be recycled for repeated reading and can be read multipletimes to thus reduce any possible read-out error.

Other embodiments may encode data in the memory strands so that it canbe accessed only under a specific set of conditions. In such cases thememory strands are at least partially comprised of nucleotidescontaining modifications attached with a cleavable linker. Modifications(e.g., chemical protecting groups) and linkers can be selected so thatif a polymer tract translocates through a nanopore without the correcttreatment, the current blockades differ from the sequence that encodesthe data. FIG. 7 outlines a scheme using disulfide and amide-linkedmodifications to a dG nucleotide and illustrates how the currentblockade and data encoded in the memory strand may change in response totreatment conditions. G* and G** are structurally similar in size andflexibility and may produce similar current blockades on nanoporeplatforms, yet are removed under different conditions. G* and G*** arestructurally different yet the modifications share the same removalconditions. Other embodiments may employ other modifications or linkersthat are cleavable with different treatments such as specificwavelengths of light, acidic or alkaline pH, oxidative or reductiveconditions, or sequence-specific nucleases. Some embodiments may use thepresence or absence of memory strand modifications for encryption or asa chemical marker of previous access or alteration to the data. Mostlinker cleavage reactions are effectively irreversible, so this approachis best suited for write-one read many systems where single moleculesmay be sufficient to encode data without redundancy.

Many possible information encoding schemes which are useful with readoutschemes of the invention are possible and may be apparent to one skilledin the art based on the present disclosure.

Synthesis may be accomplished using acoustic delivery of drops intowells of plates (e.g., 1536 well plates of 1.5 μL each). In variousembodiments, nucleic acid memory strands may be synthesized on a bead ora magnetic bead or a surface and either left on the bead or magneticbead or surface after the full-length synthesis is complete or removedfrom the synthesis support depending on the application.

In certain embodiments, systems for the synthesis of long (5-10 kb) datastrands may use inkjet delivery to arrays of wells (e.g., multiplenanoliter volume wells). In other embodiments, multiple pneumaticallycontrolled actuators can be positioned above each well to simultaneouslydeliver reagents to each position of an array. Each actuator would beserved by a selector valve that would choose between each of the two ormore nucleotides or modified nucleotides formulated with atemplate-independent polymerase that are used to specify the bits of theDNA data strand. One or more additional selector valve ports would bededicated to one or more wash reagents if necessary. The array ofnanoliter volume wells can be open at both ends, as long as the diameterof the wells is such that delivered liquids are trapped by capillaryaction within the length of the open-ended wells. After each round ofnucleotide-enzyme formulation is delivered to the open-ended well, arinse reagent or enzymatic reaction stop reagent can be flowed acrossand through the lower opening of the array of wells such that each wellis rinsed of the reaction mixture thus preparing the array for the nextcycle of enzymatic synthesis. In other embodiments, a vacuum source isused to rapidly remove one reagent from the capillary nanowells prior todelivery of the next reagent.

Certain embodiments may use highly parallel nanofluidic chambers withvalve-controlled reagent deliveries. An exemplary microfluidic nucleicacid memory strand synthesis device is shown in FIG. 8 for illustrativepurposes and not to scale. Microfluidic channels 255, includingregulators 257, couple reservoirs 253 to a reaction chamber 251 and anoutlet channel 259, including a regulator 257 to evacuate waste from thereaction chamber 251. Microfluidic devices for nucleic acid memorystrand synthesis may include, for example, channels 255, reservoirs 253,and/or regulators 257. Nucleic acid memory strand synthesis may occur ina microfluidic reaction chamber 251 which may include a number ofanchored synthesized nucleotide initiators which may include beads orother substrates anchored or bound to an interior surface of thereaction chamber and capable of optionally releasably bonding apolynucleotide initiator. The reaction chamber 251 may include at leastone intake and one outlet channel 259 so that reagents may be added andremoved to the reaction chamber 254. The reaction chamber 251 should betemperature controlled to maintain optimal and reproducible enzymaticsynthesis conditions. The microfluidic device may include a reservoir253 for each respective dNTP or analog to be used in the memory chaincoding scheme. Each of these reservoirs 253 may also include anappropriate amount of TdT or any other enzyme which elongates DNA or RNAstrands in a template-independent manner. Additional reservoirs 253 maycontain reagents for washing or other tasks.

The reservoirs 253 can be coupled to the reaction chamber 254 viaseparate channels 255 and reagent flow through each channel 255 into thereaction chamber 254 may be individually regulated through the use ofgates, valves, pressure regulators, or other means. Flow out of thereaction chamber 254, through the outlet channel 259, may be similarlyregulated. The reservoirs 253 may hold dNTPs, modified dNTPs or anyanalogs thereof described above suspended in a fluid at a knownconcentration such that the concentration of reagent may be strictlycontrolled based on the volume of reagent allowed to flow into thereaction chamber 254. Accordingly, the length of each homopolymer tractmay be managed through control of the reagent concentration.

In certain instances, reagents may be recycled, particularly the dNTPand enzyme reagents. Reagents may be drawn back into their respectivereservoirs 253 from the reaction chamber 254 via the same channels 255through which they entered by inducing reverse flow using gates, valves,vacuum pumps, pressure regulators or other regulators 257.Alternatively, reagents may be returned from the reaction chamber 254 totheir respective reservoirs 253 via independent return channels. Themicrofluidic device may include a controller capable of operating thegates, valves, pressure, or other regulators 257 described above.

An exemplary microfluidic nucleic acid memory strand synthesis reactionmay include flowing a desired dNTP (used throughout to refer referenceany component molecule used to encode data in a nucleic acid memorychain of the invention) reagent into the reaction chamber 254 at apredetermined concentration and for a predetermined amount of time(calculated to result in the desired homopolymer length) before removingthe NTP reagent from the reaction chamber 254 via an outlet channel 259or a return channel (not shown); flowing a wash reagent into thereaction chamber 254; removing the wash reagent from the reactionchamber 254 through an outlet channel 259; flowing the next NTP reagentin the desired memory strand sequence under conditions calculated toachieve the desired homopolymer tract ratio; and repeating until thedesired nucleic acid memory strand has been synthesized. After thedesired nucleic acid memory strand has been synthesized, it may bereleased from the reaction chamber anchor or substrate and collected viaan outlet channel 259 or other means.

Because of the significant number of homopolymer encoded DNA strandsrequired to encode useable amounts of data, highly parallel methods ofDNA synthesis are required. In some embodiments, as depicted in FIG. 9 ,a flow cell containing an array of wells (9-1) are formed on a suitablesubstrate by patterning horizontal and vertical stripes (9-2) ofhydrophobic materials to form a plurality of hydrophilic wells (?)bordered by hydrophobic regions. Typical dimensions of the hydrophilicwells can be 300×300 nm to 1000×1000 nm. This hydrophilic array formsthe floor of a flow cell with a gap of suitable dimensions between thefloor and an optically transparent cover. A solution of cold (i.e.,below the enzyme-specific temperature optimum) nucleotidyl transferase,one of natural or modified nucleotide triphosphates and any necessaryco-factors, is flowed into the flow cell through an inlet (9-3) suchthat upon cessation of fluid flow, the enzyme-nucleotide triphosphatesolution beads up into spatially defined droplets (9-4) positioned aboveeach hydrophilic region. An IR source (9-5) through a shaping lens (9-6)projects a beam (9-7) onto a DLP (digital light projection) device(9-8), which is used to simultaneously steer IR beams (9-9) to each ofthe hydrophobically constrained droplets (9-4) that is chosen to have aspecific nucleotide added, resulting in the rapid heating of thepolymerase extension reaction formulation to the temperature for maximumenzyme activity for a period of time defined to synthesize a homopolymerof the desired length. After some suitably defined reaction time, the IRsource is shut off and a cold rinse buffer is rapidly injected into theflow cell and drained through an outlet (9-10), thus quenching thereaction and finishing one “write” cycle. This series of steps isrepeated multiple times for each “write” cycle so that each nascent datastrand is randomly accessed according to its spatial location and thechosen nucleotide to be added, until the full length homopolymer datastrand is completed. In some embodiments, a DLP device with 1920×1080steerable mirrors can be used to simultaneously randomly access ˜2Msynthesis positions in the synthesis flow cell. In some embodiments, thetemplate-independent polymerase used is thermophilic with a reactiontemperature optimum well above room temperature, such that enzymaticactivity is minimized in the hydrophobically constrained droplets priorto the rapid heating of the droplet by the IR source. In someembodiments, the bottom surface of the flow cell, bearing thehydrophobically defined hydrophilic wells, is abutted to a coolingdevice that maintains the droplets in the hydrophobic wells at a reducedtemperature to prevent enzymatic activity until the temperature israised by the IR source.

In another embodiment, a flow cell composed of an array of wells thatare formed on a suitable substrate by patterning horizontal and verticalstripes of hydrophobic materials forming a plurality of hydrophilicspots bordered by hydrophobic regions. Typical dimensions of thehydrophilic spots can be 300×300 nm to 1000×1000 nm. Each hydrophilicspot is positioned over an individually addressable CMOS heater. Thishydrophilic-CMOS heater array forms the floor of a flow cell with a gapof suitable dimensions between the floor and an optically transparentcover. A solution of cold (i.e., below the enzyme-specific temperatureoptimum) nucleotidyl transferase and one of natural or modifiednucleotide triphosphates is flowed into the flow cell such that uponcessation of fluid flow, the polymerase extension reaction solutionbeads up into spatially defined droplets positioned above eachhydrophilic region with an associated CMOS heater. Each of thehydrophobically constrained droplet that is chosen to have a specificnucleotide added is rapidly heated to the temperature for maximum enzymeactivity for a period of time defined to synthesize a homopolymer of thedesired length. After some suitably defined reaction time, the heater isshut off and a cold rinse buffer is rapidly injected into the flow cell,thus quenching the reaction and finishing one “write” cycle. This seriesof steps is repeated multiple times for each “write” cycle so that eachnascent data strand is randomly accessed according to its spatiallocation and the chosen nucleotide to be added, until the full lengthhomopolymer data strand is completed. In some embodiments, the enzymeused is thermophilic with a temperature optimum well above roomtemperature, such that there is a low probability of unwanted nucleotideaddition in the hydrophobically constrained drop prior to the rapidheating of the droplet by the CMOS heater.

As one skilled in the art would recognize as necessary or best-suitedfor the systems and methods of the invention, systems and methods of theinvention may include computing devices as shown in FIG. 10 that mayinclude one or more of processor 309 (e.g., a central processing unit(CPU), a graphics processing unit (GPU), etc.), computer-readablestorage device 307 (e.g., main memory, static memory, etc.), orcombinations thereof which communicate with each other via a bus.Computing devices may include mobile devices 101 (e.g., cell phones),personal computers 901, and server computers 511. In variousembodiments, computing devices may be configured to communicate with oneanother via a network 517.

Computing devices may be used to control the synthesis of memorystrands, the reading of sequenced memory strands, and the compiling ortranslating of data between human or machine-readable formats, digitizeddata, and nucleic acid sequences among other steps described herein.Computing devices may be used to display the readable format of data.

A processor 309 may include any suitable processor known in the art,such as the processor sold under the trademark XEON E7 by Intel (SantaClara, Calif.) or the processor sold under the trademark OPTERON 6200 byAMD (Sunnyvale, Calif.).

Memory 307 preferably includes at least one tangible, non-transitorymedium capable of storing: one or more sets of instructions executableto cause the system to perform functions described herein (e.g.,software embodying any methodology or function found herein); data(e.g., data to be encoded in a memory strand); or both. While thecomputer-readable storage device can in an exemplary embodiment be asingle medium, the term “computer-readable storage device” should betaken to include a single medium or multiple media (e.g., a centralizedor distributed database, and/or associated caches and servers) thatstore the instructions or data. The term “computer-readable storagedevice” shall accordingly be taken to include, without limit,solid-state memories (e.g., subscriber identity module (SIM) card,secure digital card (SD card), micro SD card, or solid-state drive(SSD)), optical and magnetic media, hard drives, disk drives, and anyother tangible storage media.

Any suitable services can be used for storage 527 such as, for example,Amazon Web Services, memory 307 of server 511, cloud storage, anotherserver, or other computer-readable storage. Cloud storage may refer to adata storage scheme wherein data is stored in logical pools and thephysical storage may span across multiple servers and multiplelocations. Storage 527 may be owned and managed by a hosting company.Preferably, storage 527 is used to store records 399 as needed toperform and support operations described herein.

Input/output devices 305 according to the invention may include one ormore of a video display unit (e.g., a liquid crystal display (LCD) or acathode ray tube (CRT) monitor), an alphanumeric input device (e.g., akeyboard), a cursor control device (e.g., a mouse or trackpad), a diskdrive unit, a signal generation device (e.g., a speaker), a touchscreen,a button, an accelerometer, a microphone, a cellular radio frequencyantenna, a network interface device, which can be, for example, anetwork interface card (NIC), Wi-Fi card, or cellular modem, or anycombination thereof.

One of skill in the art will recognize that any suitable developmentenvironment or programming language may be employed to allow theoperability described herein for various systems and methods of theinvention. For example, systems and methods herein can be implementedusing Perl, Python, C++, C #, Java, JavaScript, Visual Basic, Ruby onRails, Groovy and Grails, or any other suitable tool. For a computingdevice 101, it may be preferred to use native xCode or Android Java.

FIG. 11 shows the polyacrylamide gel electrophoresis analysis of twodifferent single homopolymer tracts made via enzymatic synthesis. Lane Ais a sample of a starting 20-mer oligonucleotide that is used in allfollowing lanes. Lane B is a sample from a TdT reaction containing a20-mer oligonucleotide and the non-reversible terminator ddATP showingthe formation of a 21-mer. Lane C is a sample from a TdT reactioncontaining a 20-mer and the natural nucleotide dATP after 1 minute at37° C. Lane D is a sample of the same reaction mixture in Lane D after 5minutes at 37° C. Lane E is a sample of the same reaction mixture inLane C after 15 minutes at 37° C. Those three lanes illustratehomopolymer length control due to consumption of the input dATP in ˜5minutes during the TdT extension reaction and the observation of nohomopolymer growth between 5 and 15 minutes. Lane F is a sample from aTdT reaction containing a 20-mer oligonucleotide and the nucleotideanalog N6-benzoyl-dATP after 1 minute at 37° C. Lane G is a sample ofthe reaction mixture in Lane F after 5 minutes at 37° C. Lane H is asample of the same reaction mixture in Lane F after 15 minutes at 37° C.Although there are qualitative differences between the length of dA^(Bz)homopolymers formed in Lanes F-H, the same length control isdemonstrated even with an N6-modified dATP analog.

The writing of digital data into a molecular storage format based onmolecular approaches offers advantages over currently used storage medialike tape or disk. DNA based storage has been sparking interest becauseof the high information density achievable, the extremely long lifetimeand the low energy consumption during dormant periods. To date, mostefforts to use synthetic DNA as a storage media have involved chemicalsynthesis using the popular phosphoramidite method.

DNA based data storage can require vastly larger number of strands thanare currently synthesized for the existing research markets. Anysynthesis technology (chemical or enzymatic) that depends upon theremoval of a nucleotide blocking or terminator imposes additional stepsand complexity since reagents must be deliver to an array of synthesisfeatures (i.e., wells or spots) in an addressable fashion to direct thecorrect nucleotide to the correct location on an array. Array basedmethods of synthesis can be performed on one of several ways: 1) bulkdelivery of activated reactants followed by selective removal of ablocking group or 2) addressable delivery of activated reactantsfollowed by bulk blocking group removal or 3) bulk delivery of inactivereactants followed by addressable activation. The addressable deliveryof reagents to large (104-106) 2-D arrays is generally achieved usinginkjet deposition to each desired location. If the data encoding schemeuses four nucleotides, then four separate write heads must be used andindexed in a complex X-Y mechanical fashion. Additionally, the use ofinkjet delivery limits the dimensions between each feature (well orspot) to low tens of microns. The challenge in this process is todecrease the step time for each synthesis cycle, no matter how it isachieved.

In certain embodiments, systems and methods of the invention may includedelivering an inactive reaction mixture to every feature on a 2-D array,then selectively activating only those feature that require the additionof an A or G or C or T. Addressable methods of delivery, activation orblocking group removal that do not rely on mechanical movement arepreferred. A preferred embodiment uses bulk delivery of reactants andselective activation of specific synthesis features by removal of ablocking group from a nucleotide analog that then allows rapid DNApolymerase mediated incorporation and formation of a homopolymer bit asshown in FIG. 15 , thus enabling the highly parallel and rapid synthesisof homopolymer encoded nucleic acid memory strands. There are severalother advantages to homopolymer encoded nucleic acid informationpolymers: 1) homopolymer encoded bits overcome the error profilesassociated with next generation sequencing, 2) the resulting polymersare unnatural nucleic acids and thus cannot be repurposed forbioterrorist activities.

Delivery and selective activation of template independent polymerase DNAsynthesis can be done serially (deliver A->addressably activate andinitiate homopolymer synthesis->wash; deliver C->addressably activateand initiate homopolymer synthesis->wash; deliver G->addressablyactivate and initiate homopolymer synthesis->wash; deliverT->addressably activate and initiate homopolymer synthesis->wash) or inparallel (deliver all four nucleotides to all featuressimultaneously->addressably activate A, C, G, T either simultaneously orserially->wash). In this fashion, the synthesis cycle becomes veryefficient and involves only three steps: reactant delivery,incorporation reaction activation, then wash before the next cyclestarts. The reaction is halted by either a rapid removal out of thereactants by a gas or a liquid or by the rapid delivery of a quenchingreagent. In a preferred embodiment, the quenching reagent is a metalchelator.

A preferred design for an apparatus that can be used to synthesizehomopolymer bit encoding memory strands consists of a flow cell that iscomposed of a 2-D array of hydrophobically patterned wells that aresuitably modified to support template independent enzymatic synthesis.After delivery of reactants to the 2-D array of hydrophobic wells,liquids will bead up defining spatially distinct reaction zones as shownin FIG. 16 .

A preferred embodiment is one in which the bottom surface of the well,enclosed on four sides by hydrophobic patterning, is modified with acovalently attached oligonucleotide initiator, in a 5′→3′orientationwith the 5′-termini attached to the bottom surface of the well. Inanother embodiment, the well is physically formed by etching a cavity inthe bottom surface of the flow cell, in which case the covalentlyattached oligonucleotide initiator is covalently attached to thesurfaces (bottom and sides) of the well. In some cases, the well is openat the bottom to allow for liquid flow through the well. In other cases,the well is closed at the bottom. Nucleotide analogs that are protectedat the 3′-OH are generally inactive with commercially available or WTTdT enzyme (U.S. Pat. No. 10,059,929). Thus, a mixture of a 3′-blockeddNTP analog, TdT protein and suitable co-factors, can be mixed togetherin the presence of an initiator oligonucleotide, at 37° C. with littleto no homopolymer formation. Once the 3′-OH is “decaged” (i.e.,unblocked) by removal of the protecting or blocking group, the resultingnucleotide is available for free running incorporation and homopolymerformation. Decaging that can be accomplished by “deliveryless” methodsare preferred. Analogs that are constructed to allow addressable 3′-OHdecaging by administration or exposure to an activating energy such aslight, heat, electrochemical generation of pH change, or a reducingagent are preferred. Each of these decaging or unblocking reactions canbe accomplished in flow-cells that are appropriately constructed to witheither mechanically steerable light through light transparent covers(e.g., FIG. 17 ), individually addressable heater (e.g., FIG. 18 ),electrochemically induced pH change or the generation of reducingconditions.

Each of the flow cell designs illustrated is compatible with a method ofaddressable decaging of a nucleotide contained in a droplet constrainedby the hydrophobic patterning around it. FIG. 19 illustrates anapparatus that could be used with a flow cell designed for the use oflight decaged nucleotide analogs. Other embodiments and configurationsknown to one skilled in the art are possible for this purpose.

Each decaging mechanism requires nucleotide analogs specificallydesigned for the physiochemical process selected for use in the system:

In some embodiments, nucleotide analogs suitable for light mediateddecaging are 3′-O-(2-nitro-benzyl)-dNTP. In some embodiments, nucleotideanalogs suitable for heat mediated decaging are3′-O-(tetrahydrofuranyl)-dNTP. In some embodiments, nucleotide analogssuitable for reduction mediated decaging are3′-O-methyl-dithiomethyl-dNTP. Many other 3′-OH protecting groups aresuitable, as long as the resulting 3′-OH modified nucleotide analog isnot a substrate for a template independent polymerase and is readilyremoved by “delivery-less” methods like light, heat or electrochemicallygenerated reactants. The composition of these dNTP analogs is differentthan that described in WO 2016/034807, in so far as the 3′-Omodifications described therein are explicitly stated to be substratesfor a template independent polymerase and are called reversibleterminators. The subject of this invention are 3′-O-blocking groups thatare explicitly not substrates for a polymerase and serve to cage thedNTP until removed thus allowing polymerization to proceed. Mathews A Set al (2016) describe the use of 3′-O-(2-nitrobenzyl)-dNTP analogs asreversible terminators for the controlled enzymatic synthesis ofnatural, non-homopolymer oligonucleotides. They explicitly teach the useof such nucleotides as substrates for DNA synthesis by using extremelylong (ca 1 hour) enzymatic reaction times and furthermore explicitlyteach their use as reversible terminators in contrast to the subject ofthis patent, which teaches the use of these analogs as caging groups toinitiate enzymatic polymerization of multiple nucleotides.

Initiation of free running homopolymer synthesis can be achieved byother methods than caged dNTP analogs. Some embodiments can use fluidicpulses of polymerase to start and stop ssDNA synthesis as described inChurch U.S. Pat. No. 9,928,869 (2018). In Reza et al WO 2017/196783,template dependent enzymatic synthesis is initiated by activation ofpolymerase by an electrochemically generated pH change. Althoughmodified nucleotide analogs, including 3′-O-reversible terminators aredescribed, neither patent teaches the activation of homopolymersynthesis using decaging of caged dNTP analogs. Church U.S. Pat. No.9,928,869 (2018) describes free running homopolymer synthesis usingnatural dNTPs and controlling the length of homopolymers formed by thereaction duration. Lee H R et al (2018) describes the use of mechanicaldelivery to deposit natural nucleotides to a plurality of reaction zoneson a 2D array and control the length of homopolymer formation byactively destroying the unreacted unmodified dNTP with apyrase. In someembodiments of the subject of this patent, homopolymer rate modulatingmodified dNTP analogs with unmodified 3′-OH are used in combination witheither fluidic pulses or mechanical XY delivery or pH activated templateindependent DNA polymerases.

FIGS. 20 and 21 show the kinetics of light mediated decaging. As the3′-caged dNTP in the template independent polymerase reaction mixture isconverted to a substrate for the polymerase, it is polymerized into thenascent homopolymer strand. Complete conversion of the caged dNTP touncaged (substrate) dNTP is not required as available uncaged dNTP isreadily incorporated by the polymerase. Once decaged, the length of thedesired homopolymer is controlled by controlling the concentration ofdecaged dNTP generated and/or the time of the enzymatically mediatedincorporation reaction.

FIG. 22 shows examples of two nucleotide analogs that are caged with3′-O modifications that are able to be decaged with visible wavelengthsof light and exhibit tunable photophysical properties (Peterson J. A.,et al J. Amer. Chem. Soc. 2018 140:7343-6), this allowing for thesimultaneous introduction, decaging and synthesis of two differenthomopolymer strand sequences. In some embodiments, nucleotide analogsmodified with two different 3′-caging species, capable of being decagedby two different wavelengths of light, are delivered simultaneously to a2D array of synthesis locations. In another embodiment, four dNTPanalogs modified with four different light removable, 3′-caging speciesare delivered simultaneously to a 2D array of synthesis locations anddecaged with four different wavelengths of light. Decaging reactions canbe carried out sequentially on one set of locations on the 2D arrayfollowed by delivery of the two remaining dNTPs and subsequentsequential decaging, thus completing one round of increasing the lengthof all information polymers by one nucleotide. In an alternativeembodiment, four dNTP analogs modified with four different 3′-cagingspecies can be delivered simultaneously and decaged by the simultaneousexposure of each synthesis feature to one of four different wavelengthsof light. A hardware configuration like that shown in FIG. 19 can beused for multicolor decaging by the incorporation of 2 or morewavelength specific light sources and appropriate shuttering mechanismto direct one of the 2 or more light sources onto the 2D light directingsystem.

FIG. 23 shows the synthesis of a 12-bit information strand consisting ofconsecutive cycles of homopolymer synthesis using dCTP, dGTP, dTTPinterspersed with UV light decaged 3′-oNBn-dATP at cycles #1, 6, 8, 10,12. The incorporation reaction can be terminated by several methods,including but not limited to the rapid removal of the enzymatic reactioncomponents either by introduction of a bolus of a gas or a liquid. Insome embodiments the liquid simply rinses away the reaction componentswhile in other embodiments, the rinse liquid contains active quenchingagents like EDTA or other enzymatic inhibitors.

In another embodiment, the method of caging the dNTP can be mediated bya steric mechanism that involves a modification to the nucleotide baseinstead of the 3′-OH. In a preferred embodiment, the nucleotide, with anunmodified 3′-OH, is modified with a removable steric blocking group(incorporation blocker) at the N6, N4, N2, 04 of A, C, G, or Trespectively that cages the dNTP. Homopolymer synthesis can be initiatedby either light mediated, heat mediated or reduction mediated cleavageof the steric blocking group, thus “uncaging” a natural nucleotidesuitable for free running homopolymer synthesis.

In some embodiments, the modified nucleotide is comprised of a linkercontaining one portion that renders the dNTP caged and thusun-incorporable until removed and another portion remains covalentlybound though out the multiple homopolymer tract synthesis. In someembodiments, the purine or pyrimidine base is modified at two locations,one modification caging the nucleotide and preventing enzymaticincorporation, the other remaining covalently bound throughout thepolymer synthesis; each modification removable by different mechanismsallowing selective removal of each. In some embodiments, the purine orpyrimidine base is modified at two locations, each one removable bydifferent mechanisms allowing selective removal of each.

Suitable dNTP analog modifications can be designed that will result in anucleotide with either no scar or with a scar depending upon thesequence detection modality to be used for reading the homopolymer bitdata strand. For readout methods that rely on Sequencing By Synthesis(SBS), all modifications to the purine or pyrimidines must be removedafter homopolymer synthesis but prior to SBS readout. For readoutmethods that rely on current modulation during polymer translocationthrough one or more nanopores, modifications to purines and pyrimidinesthat are readily distinguishable from each other are desired. In someembodiments, modifications are made to the purine and pyrimidine basethat modulate the polymerase kinetics during homopolymer synthesis.These “rate modulating” modifications may also act as current modulatorsfor nanopore detection, or they may be removed for SBS detection.

Regardless of the mechanism of activation, controlling the length offree running homopolymer synthesis is important. Ideally, homopolymersof 2 to 4 nucleotides are desirable. In the presence of naturalnucleotides, TdT has a propensity for forming homopolymers based on thenucleobase corresponding to the Km of the individual nucleotides(A>T>G>C). As noted by Lee H R et al (2018), it is difficult to limithomopolymer growth without increasing the deletion frequency. The aboveauthors report ˜66% del errors in their attempts to limit homopolymergrowth to 2-3 nucleotides. Although TdT is reported to operate in adistributive manner, leading to Poisson distributions of homopolymerlength in free running synthesis, in practice it is hard to drivecomplete conversion of the initiator nucleic acid during homopolymersynthesis without long enzymatic extension reaction times. Too short offree running homopolymer synthesis reaction time leads to bit deletionerrors as reported above. Too long of a homopolymer synthesis reactiontime leads to excessively long homopolymer formation and inefficientdata density. One solution is to use modified nucleotide analogs thatmodulates the rate of multiple base incorporation but don't requireremoval at every step, thus allowing for complete conversion of theinitiating nucleic acid, control of the length of the homopolymerformed, and preserving the simple two step cycle that is the subject ofthis patent. Ideally, homopolymer tracts in information polymers aregreater than two nucleotides but four or less nucleotides in length.

If necessary, the homopolymer length limiting modifications can beremoved from all nucleotides at the end of the synthesis, thusgenerating a natural DNA molecule, suitable for SBS detection.Modifications that are chemically compatible with the decagingconditions described above are most desirable; they should be removableby chemical conditions that are orthogonal to those used for decaging.In some embodiments, the 3′-OH is caged by 3′-O-(2-nitrobenzyl), whilethe N6 of dA, the N4 of dC, the N2 of dG and the O4 or N3 of dT aremodified with non-terminating moieties that modulate multiple additionsduring free running enzymatic homopolymer synthesis. FIG. 24 shows thecomparison of free-running homopolymer synthesis is the presence andabsence of rate modulating modifications. In some embodiments, thehomopolymer synthesis rate modulating modifications are the same for allfour nucleotide analogs, while in another embodiment, the modificationsare different for each of dATP, dCTP, DGTP & dTTP analogs. In someembodiments, the rate limiting modifications remain in place during theentire synthesis.

FIGS. 25 and 26 show the results of two consecutive addition cycles oftwo differently modified nucleotide analogs. In another embodiment, thenascent information polymer is exposed to periodic or alternating cycleswith chemical conditions that result in partial or complete removal ofthe homopolymer rate limiting modifications. In some embodiments of theuse of Class 2 dNTP analogs, previously incorporated rate modulatingmodifications are removed simultaneously with polymerase extension bythe inclusion of a mild reducing agent in alternating cycles ofenzymatic extension reaction mixture.

If a non-SBS method of readout is to be used (i.e., nanopores), thehomopolymer rate modulating modifications could be covalently attachedwith non-cleavable linkages and left in place during the readout. Insome embodiments, homopolymer rate modulating modifications are alsoused to encode information during the readout process. In a preferredembodiment, the homopolymer synthesis rate modulating analogs are alsodesigned to modulate the current during nanopore translocation andfurthermore, are selected to provide two or more detectable anddistinguishable current blockade levels. In another embodiment, only asingle type of nucleotide (i.e., A or C or G or T) is used ininformation strand synthesis and the homopolymer bits are encoded by twoor more differential current blockade generating modifications.

FIGS. 27-30 shows an example of the detection by a solid state nanoporeof an enzymatically synthesized homopolymer of peptide-modified dNTPanalogs. A dTTP analog modified with a cleavable disulfide linkage to afive amino acid peptide (N-Ac-CYPEE), was used to generate a fullymodified molecule via free running homopolymer synthesis of >100 nt inlength. Translocation through a 2D silicon nitride 30 nm thick nanopore,at 500 mV resulted in a large signal deflection compared to anunmodified dU homopolymer (courtesy of Goeppert LLC, Philadelphia, Pa.).

The following table describes six different classes of modified dNTPanalogs that are useful for the three different “delivery-less”homopolymer synthesis activation approaches and two differenthomopolymer encoded nucleic acid memory strand readout technologies.

SBS or nanopore detection Only nanopore detection Analog Analog AnalogAnalog Analog Analog Class I Class II Class III Class IV Class V ClassVI dNTP dATP, dATP, dATP, dATP, Only one Only one Analogs dCTP, dCTP,dCTP, dCTP, dNTP dNTP dGTP, dGTP, dGTP, dGTP, dTTP dTTP dTTP dTTP 3′-OHRemovable Removable unmodified unmodified Removable unmodifiedincorporation incorporation incorporation blocker blocker blocker Baseunmodified Removable Removable Removable >2 >2 incorporationincorporation incorporation differentially differentially rate blockerblocker detectable detectable modulator orthogonally non-removablenon-removable removable modifications modifications incorporationorthogonally rate removable modulator incorporation blocker

Below are shown examples of four light mediated decaging nucleotideanalogs that make up Class I:

This class of analogs demonstrates 3′-O-caged dNTPs that become naturalnon-modified dNTPs upon light mediated decaging. Once decaged, the rateof homopolymer formation will be no different than that of a naturalnucleotide and the length of the homopolymers formed must be modulatedby additives to the template independent polymerase reactionformulation. In some embodiments, tetrahydropyranyl,4-methoxy-tetrahydropyranyl, tetrahydrofuranyl, acetyl, methoxyacetyl,or phenoxyacetyl modifications are useful for the heat-triggereddecaging of 3′-O blocked dNTP analogs. In some embodiments, 3′-O-analogslike —O—CH₂—S—S—R are useful for electrochemically mediated decaging byreductive conditions. Class I dNTP analogs are characterized bynucleotides that result in unmodified homopolymers that are suitable forreadout by either classical Sequencing by Synthesis or by ratchetingstyle nanopores. In both instances, accuracy in the length of thehomopolymers is not necessary, but the accurate detection of transitionsbetween the homopolymers is required.

Below are shown examples of four Class II light mediated decagingnucleotide analogs, with orthogonally removable peptide rate modulatingmodifications:

Peptide modifications for use with Class II dNTP analogs, are meant tobe removed at the end of the information polymer synthesis forsubsequent interrogation by either sequencing methods that rely on SBSor by translocation through a nanopore, sensitive enough to detect thetransitions form one homopolymer type (A, C, G, T) to the next. Peptidessuitable for use in this application, slow down the rate ofincorporation of modified nucleotide analogs, to prevent longhomopolymer formation prior to complete conversion of unmodified strandsor modified strands of a different composition. In some embodiments, theincorporation rate modulating modifications may consist of 1, 2, 3, or 4different peptides for the four nucleotide analogs. In some embodiments,the peptides are linked to the nucleotide through a disulfide linker,with a self-immolating scar, cleavable under mild chemical conditions.In some embodiments, the peptides may consist of, but are not limitedto, Ac-EECGY (SEQ ID NO. 5), Ac-EEGCGW (SEQ ID NO. 6), Ac-c, Ac-EC-pNA(SEQ ID NO. 7), Ac-CWEE (SEQ ID NO. 8), Ac-CYPEE (SEQ ID NO. 9),Ac-EEGCPPW (SEQ ID NO. 10), Ac-CPYEE (SEQ ID NO. 11), Ac-CPWEE (SEQ IDNO. 12), Ac-CWPEE (SEQ ID NO. 13). Many other peptide sequences andcompositions are possible to one skilled in the art. Peptides with anoverall anionic composition seem most suitable; peptides with cationiccomposition accelerate the rate of incorporation of nucleotides, aspreviously noted by Finn P S et al 2003. Peptides with covalent linkageto the nucleotides other than through a disulfide to a cysteine aminoacid are possible so long as the ability to remove them from thenucleotides of the completed homopolymer DNA strands, without leaving aresidual scar is maintained. Particularly useful are disulfide cleavagemediated self-immolating linkers that eliminate under mild conditions bythe formation of a thiolactone.

Below are examples of four nucleotides of Class II, with light mediateddecaging and orthogonally removable non-peptide, homopolymer synthesisrate modulating modifications:

Non-peptides modifications to the N6 of Adenine, the N4 of Cytidine, theN2 of Guanine and the N3 of Thymine can act as incorporation ratemodulators. In some embodiments, acetyl, diacetyl, isobutyryl,proprionyl, pivaloyl, benzoyl, cyclohexyl and other organic modifiersare useful. Modifiers of this type are removed post information polymersynthesis by but not limited to ammonolysis. Class II dNTP analogs arecharacterized by nucleotides that result in unmodified homopolymers thatare suitable for readout by either classical Sequencing by Synthesis orby ratcheting style nanopores. In both instances, accuracy in the lengthof the homopolymers is not necessary, but the accurate detection oftransitions between the homopolymers is required.

Below are examples of two Class III nucleotide analogs, with unmodified3′-OH and removable incorporation caging modifications on a purine orpyrimidine:

In some embodiments, the caging modifications to the 2-nitrobenzylfunction that are useful are bulky modifications like but not limited topeptides, cyclic peptides, PEG, branched PEG, star PEG, dendrimers,nanoparticles, Class III dNTP analogs are also characterized bynucleotides that result in unmodified homopolymers that are suitable forreadout by either classical Sequencing by Synthesis or by ratchetingstyle nanopores. In both instances, accuracy in the length of thehomopolymers is not necessary, but the accurate detection of transitionsbetween the homopolymers is required.

Below is an example of one Class IV nucleotide analog, with unmodified3′-OH, removable incorporation blockers at positions other than the3′-OH and orthogonally removable homopolymer synthesis rate modulatormodifications also not located at the 3′-OH:

Nucleotide analogs of Class IV are designed to have one or moreincorporation blocking modifications that cage the nucleotide fromenzymatic incorporation until removed by light, heat or electrochemicalmeans. Large, bulky modifications that render the nucleotide analoginactive to polymerase incorporation are preferred. In some embodiments,the one or more modifications are derivatives of 2-nitrobenzyl that canbe removed by photolysis. Removal of the caging modifications at everycycle, leaves incorporation rate modulating modifications that areremoved at the completion of the information polymer synthesis, by achemical method orthogonal to that used to decage the nucleotide analog.In a preferred embodiment, the incorporation blocking modifications arecovalently attached to the incorporation rate modulating modifications.In some embodiments caging modifications to the 2-nitrobenzyl functionthat are bulky modifications and act as steric blockers are useful.Examples of modifications that can act as steric blockers are, but notlimited to, peptides, proteins, peptoids, PEG, branched PEG, star PEG,dendrimers, or nanoparticles.

Below are examples of Class V pyrimidine nucleotides, with 3′-O-cagingmodifications and non-removable base modifications:

In the case where nanopore detection by direct translocation is desiredas the readout method, the nucleotides are doubly modified with asuitable removable 3′-O-caging group and with a covalent, non-removablemodification that acts as a blockade current modulator during directtranslocation through a non-indexing nanopore. Suitable modificationsmay be peptides or non-peptides. The ideal embodiment for a blockadecurrent modulating group is the smallest possible modification thatgives the most unique signal in the shortest possible homopolymerstretch. A key innovation of this class of dNTP analogs is that only onetype of modified nucleotide is required because the “sequence” of thehomopolymer is encoded by the sequence of two or more current modulatingmodifications, not the nucleotide itself.

Below are examples of Class VI nucleotide analogs, with unmodified3′-OH, removable incorporation blocking (caging) modifications and twoor more different types of non-removable current modulatingmodifications, useful for detection by direct nanopore sequencing. A keyinnovation of Class VI dNTP analogs is that only one type of modifiednucleotide is required, because the “sequence” of the homopolymer isencoded by the sequence of two or more current modulating modifications,not by the nucleotide itself. Encoding is not limited to base 2 or base4, but only by the number of current modulating modifications that canbe made to a single purine or pyrimidine nucleotide.

In some embodiments, the incorporation blocking (caging) modification isremovable by either light, heat or electrochemical means, while the twoor more nanopore detection elements are non-removable and surviverepeated exposure to the decaging conditions used at every cycle duringthe homopolymer strand synthesis. Caging modifications to the2-nitrobenzyl that act as steric blockers are bulky modifications, likebut not limited to, peptides, proteins, peptoids, PEG, branched PEG,star PEG, dendrimers, or nanoparticles.

EXAMPLES Example 1

N⁶-benzoyl-deoxyadenosine triphosphate was prepared by charging a vialwith N⁶-benzoyl-2′-deoxyadenosine (0.055 g, 0.16 mmol) under dry N2 andtrimethyl phosphate (0.435 mL) was added. To the resulting solution wasadded tributylamine (0.077 mL, 0.32 mmol) and the reaction mixture wasflushed with dry N2 for 30 min while being held at −5° C. To this vialwas added anhydrous phosphorous oxychloride (0.018 mL, 0.19 mmol) viasyringe and the reaction mixture was stirred at −5° C. for 3 min. Asecond aliquot of anhydrous phosphorous oxychloride (0.009 mL, 0.10mmol) was added via syringe and the reaction mixture was stirred at −5°C. for 8 min. A second vial was charged with tributylamine pyrophosphate(0.075 g, 0.14 mmol), flushed with dry N2, and anhydrous acetonitrilewas added (0.609 mL), followed by tributylamine (0.231 mL, 0.97 mmol).The prepared tributylamine pyrophosphate mixture was cooled to −20° C.and added to the reaction mixture and allowed to react for 10 m. Thereaction was quenched by the dropwise addition of H₂O (4.35 mL). Thecontents of the flask were combined with 0.87 mL of H2O and extractedwith dichloromethane (3×150 mL). The aqueous phase was adjusted to pH6.5 with concentrated NH₄OH and stirred for 12 h at 4° C. The mixturewas transferred to a 250 mL round bottom flask with 50 mL of water, andconcentrated under reduced pressure. The residue was dissolved in 40 mLwater, and purified via ion-exchange chromatography (AKTA FPLC,Fractogel DEAE 48 mL column volume, stepwise gradient 0->70% TEAB inwater, pH 7.5). Fractions containing the desired product were pooled,the concentration by A260, and concentrated under reduced pressure, withremoval of residual triethylammonium bicarbonate via iterativeconcentration from water (5×50 mL) to dryness to provideN⁶-benzoyl-2′-deoxyadenosine triphosphate.

Controlled synthesis of homopolymer tracts, comprised of modifiednucleotides, by a nucleotidyl transferase, TdT, was conducted in thefollowing manner. Stock solutions of Deoxyadenosine triphosphate(TriLink Biosciences) and N⁶-benzoyl-deoxyadenosine triphosphate at 1 mMeach were prepared in H₂O.

0.5 μL (500 pmoles) of each of the different triphosphates wasseparately combined with 1.5 μL (30 U) of commercially available TdT(Thermo Scientific), 0.5 μL (50 pmoles) of 5′-TAATAATAATAATTTTT-3′ (IDT)SEQ ID NO. 1, 2 μL of commercially available TdT Rxn buffer (ThermoScientific—1M potassium cacodylate, 0.125M Tris, 0.05% (v/v) TritonX100, 5 mM C_(o)Cl2 (pH

7.2 at 25° C.)) and 4.5 μL of H2O. The reactions were incubated at 37°C. 30 μL aliquots were removed after 1 m, 5 m, & 15 m and each quenchedwith 20 μL 5 mM EDTA. Each sample was dried down under vacuum andreconstituted in 100 μL H2O. 10 μL of each timepoint was mixed with 10μL of denaturing load buffer (100% formamide and 0.1% Orange G) andapplied to the well of a 1 mm×20 cm×14 cm 20% polyacrylamide gel. Afterelectrophoresis at 400 V for 3.5 h, the bands were visualized with SybrGold (Thermo Scientific) and photographed under UV illumination(UV-blocking Wratten 2A filter 405 nm cutoff, UVP, LLC).

For synthesis of multiple homopolymer tracts, the initiatoroligonucleotide may be attached to a bead to allow multiple rounds ofenzymatic synthesis interspersed with removal of the previous reactantsand washes. 5′-biotin-TAATAATAATAATTTTT-3′ (IDT) SEQ ID NO. 1 can beincubated with streptavidin coated magnetic sepharose microbeads (GEHealthcare Life Sciences). Oligonucleotide-charged beads can be preparedby removing an aliquot of bead slurry and transferring to filter cup.The beads can then be washed five times with 1×PBS (using 2× volume ofbead slurry) vortex and spin down each rinse. % A bead slurry volume of1×PBS may then be added, and biotinylated oligonucleotide can be spikedin at 1/10th published bead binding capacity. The mixture can beincubated at 37 C for 2 hours with vortexing every 30 min. After 2hours, a small amount of supernatant may be removed and the A260measured for any unbound oligonucleotide. Once the A260 shows <10%remaining oligonucleotide, the beads can be washed 5× with MQ water. Thewashed beads can be brought up to a desired concentration in MQ water.

Homopolymer synthesis can be performed as described above using 2-10×molar equivalents of desired dNTP relative to bead boundoligonucleotide. The reaction mixture containing beads, dNTP, TdT andbuffer can be incubated at 37° C. for 15 minutes. The reactions may bestopped with 10 μl EDTA and rinsed 3× with water. A new cycle ofhomopolymer synthesis may be initiated by adding fresh TdT enzyme, dNTP,buffer and incubated at 37° C. for 15 minutes. After stopping thereaction with EDTA and rinsing 3× with water; the cycle of homopolymersynthesis can be repeated as many times as desired. After the last EDTAquench and 3× rinsing with water, the support-bound alternatinghomopolymers can be cleaved from the solid support using 100 μl concammonium, and the supernatant dried down by Gen-vac, then stored at −20°C. until ready to be analyzed using a polyacrylamide gel as describedabove.

In another experiment, A two-character message, “MA”, was converted intobinary and then into a base-two nucleotide code as shown in FIG. 12 .Each letter character was converted into a corresponding number from 1to 26 (A→1, . . . Z→26). Each number was then converted into a six-bitbinary number by appending two zeros to the normal binary representationof 1 to 26 (e.g., “M”=001101; “A”=000001). Each bit of the binaryrepresentation was converted into a base-two nucleotide representationaccording to the following table:

Odd bit position Even bit position “0” A C “1” T G

Thus “MA” was translated to 001101 000001 and then to the singlenucleotide string, ACTGAGACACAG, SEQ ID NO. 2, which was synthesized asA_(n)C_(n)T_(n)G_(n)A_(n)G_(n)A_(n)C_(n)A_(n)C_(n)A_(n)G_(n), where eachnucleotide is synthesized as a variable length homopolymer.

Synthesis of the 12-bit homopolymer encoded nucleic acid was conductedwith a 5′-biotinylated 39nt long oligonucleotide initiator attached to34 um streptavidin-sepharose beads (GE Healthcare) at ˜20 pmol/ul beads:5′Biotin-CAGGTCCTAUCGATATCTGTGAGCTTAATGTCCTTATGT-3′, SEQ ID NO. 3.

The oligonucleotide contains two features for releasing the finalproduct from the solid support used during synthesis: 1) a singledeoxyuridine residue that allows cleavage with the USER enzyme system(New England Biolabs) and 2) an Eco R V endonuclease restriction site.Starting with ˜2 nmol of bead-bound initiator, each variable lengthhomopolymer was enzymatically synthesized using TdT and one of fourmodified nucleotide triphosphates. Each reaction was conducted in atotal volume of 750 ul containing 40-100 uM modified dNTP (40 uM-A; 100uM-C; 50 uM-T; 100 uM-G), 20 U TdT (Thermo-Fisher Scientific), 1×TdTBuffer (Thermo-Fisher Scientific) with incubation for 2.5-20 min at 37°C. After each enzymatic extension step, the reaction was quenched byadding 500 ul of 250 mM EDTA in 10 mM Tris buffer (pH 6.8). The beadswere recovered by centrifugation at 10000×g and removal of thesupernatant. FIG. 13 shows the PAGE analysis of each cycle of enzymaticsynthesis of a 12-bit homopolymer data strand. Each lane is marked withthe cycle number, starting with “N”, which shows the unreacted 39ntinitiator. The black arrow points to a 60nt oligonucleotide size marker.

After removal of the full-length data strand from the solid support, NGSlibrary preparation was performed using the ACCEL-NGS® 1S PLUS DNALIBRARY KIT (Swift Bioscience) following the manufacturer's instructionsand subsequently sequenced using an Illumina MiSeq System. FIGS. 14A-Lshows histograms of the observed base composition of each of the twelvenucleotide additions (as labelled) generated during the enzymaticsynthesis.

Example 2

Procedures for Synthesizing Class I—Purine & Pyrimidine dNTP Analogs

Schemes for synthesis of Class I dNTP analogs

Example 3

Detailed Procedures for 3′-O-Nitrobenzyl Deoxyadenosine:

9-[β-D-5′-hydroxy-2′-deoxyribofiuranrosyl]-6-chloropirine (1.00 g, 3.69mmol) and imidazole (554 mg, 8.12 mmol) were dissolved in anhydrousdimethylformate (18 mL), followed by addition of tert-butyldimethylsilylchloride (611 mg, 3.93 mmol). The reaction mixture was stirred at roomtemperature for 20 hours under argon. The dried residue was impregnatedon silica and purified by flash column chromatography (hexane/ethylacetate, 2:1) to obtain9-[β-D-5′-O-(tert-butyldimethylsilyl)-2-deoxyribofiuranosyl]-6-chloropurine.

9-[β-D-5′-O-(tert-butyldimethylsilyl)-2′-deoxyribofuranosyl]-6-chloropurine(1.73 g, 4.48 mmol) was dissolved in anhydrous dichloromethane (135 mL).Tetrabutylammonium bromide (722 mg, 2.24 mmol), 2-nitrobenzyl bromide(2.41 g, 11.2 mmol) and 40% aqueous sodium hydroxide (65 mL) were addedto the previously made solution. The reaction mixture was stirred atroom temperature for 1 h and diluted with ethyl acetate (300 mL). Thelayers were separate. The aqueous layer was extracted with ethyl acetate(125 mL×2). The combined organic layers were dried over anhydrous sodiumsulfate. The organic layer was impregnated on silica gel, followed bypurification with flash column chromatography (hexanes/ethyl acetate,2:1) to yield9-[β-D-5′-O-(tert-butyldinmethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxyribofiranosyl]-6-chloropurine.

9-[β-D-5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxyribofuranosyl]-6-chloropurine.(2.22 g, 3.83 mmol) was dissolved in anhydrous tetrahydrofuran (40 mL)and cooled down to 0° C. Followed by addition of 1.0 Mtetrabutylammonium fluoride in tetrahydrofuran solution (4.20 mL, 4.20mmol) dropwise. The reaction mixture was stirred at room temperature for1 h. After reaction mixture was dried, the residue was dissolved indioxane and 7 N ammonia in ethanol (40 mL). The reaction mixture wasstirred in the sealed round bottom at 90° C. for 18 h. The reactionmixture was impregnated on silica and purified by flash columnchromatography (dichloromethane/methanol, 20:1) to obtain3′-O-(2-nitrobenzyl)-2′-deoxyadenosine.

3′-O-(2-nitrobenzyl)-2′-deoxyadenosine (15 mg, 1 Eq, 38 μmol) wasco-evaporated with pyridine (1 mL×3) and dried on high vac overnight. Itwas then dissolved in 1.5 mL of trimethylphosphate and 0.60 mL drypyridine and cooled in an ice bath under argon. A first aliquot of 6 uLof phosphoryl trichloride (18 mg, 11 μL, 3 Eq, 0.11 mmol) was added.Five minutes later, a second aliquot of 5 uL was added. The mixture wasstirred an additional 30 min. A solution of tetrabutylammonium hydrogendiphosphate (0.14 g, 4 Eq, 0.15 mmol) in 1.5 mL dry DMF was preparedunder Ar and cooled in an ice bath. This was added to the rxn mixturedropwise over 30 sec. Immediately, the preweighedN1,N1,N8,N8-tetramethylnaphthalene-1,8-diamine (33 mg, 4 Eq, 0.15 mmol)was added as a solid in one portion. The mixture was stirred for 30 minafter this addition and was quenched with 8 mL of cold 0.1 M TEABbuffer. The mixture was stirred in the ice bath for 10 min and thentransferred to a separatory funnel. The solution was extracted 1× with10 mL of EtOAc. The aq layer was transferred to a small tube for FPLCseparation which was conducted immediately after the EtOAc extraction.Final purification was by reverse phase HPLC.

Example 4 Detailed Procedures for 3′-O-Nitrobenzyl Deoxycytidine

3′,5′-di-O-(tert-butyldimethylsilyl)-2′-<deoxyuridinte (1.00 g, 2.12mmol) was dissolved in anhydrous acetonitrile (90 mL) and cooled down to0° C. under argon. Phosphoryl trichloride (1.49 mL, 2.12 mmol) was addeddropwise over 2 minutes. After 10 minutes, triethylamine (11.1 mL, 79.7mmol) was added dropwise over 3 minutes. After 15 minutes, the reactionmixture was stirred at room temperature over 2 h The reaction mixturewas cooled to 0° C. and triazole (4.40 g, 63.7 mmol). was added as asolid in one portion. The precipitate was observed, and the suspensionwas stirred for 30 minutes. After stirring at room temperature for 2 h,the reaction mixture was concentrated to dryness. The residue wasdissolved in dichloromethane (30 mL) and washed with saturated solutionof sodium bicarbonate (25 mL×2), brine (25 mL). The organic layer wasdried over anhydrous sodium sulfate and concentrated to dryness. Theresidue was dissolved in dichloromethane, followed by addition allylalcohol (2.00 mL, 29.4 mmol) and triethylamine (2.67 mL, 18.9 mmol). Thereaction mixture was stirred at 0° C. for 15 minutes. DBU (0.33 mL, 2.17mmol) was added stirred at room temperature for 6 h. The reactionmixture was diluted with dichloromethane (17 mL) and washed with brine(15 mL). The organic layer was dried over anhydrous sodium sulfate. Theorganic layer was impregnated on silica and purified by flash columnchromatography (hexane/ethyl acetate, 4:1) to yield4-O-allyl-3′,5′-di-O-(tert-butyldimethylsilyl)-2′-deoxyuridine.

4-O-allyl-3′,5′-di-O-(tert-butyldimethylsilyl)-2′-deoxyuridine (3.37 g,5.27 mmol) was dissolved in dry tetrahydrofuran (50 mL). Triethylamine(1.98 mL, 14.2 mmol) was added followed by the triethylammonium fluoridedihydrofluoride (2.32 mL, 14.2 mmol) under argon. The reaction mixturewas stirred at room temperature for 29 h, followed by concentration. Theresidue was dissolved in dichloromethane (100 mL) and washed with 1.5 Mammonium carbonate (75 mL×1), brine (75 mL). The organic layer was driedover anhydrous sodium sulfate, impregnated on silica and purified byflash column chromatography (dichloromethane/methanol, 9:1) to obtain4-O-allyl-2′-deoxyuridine.

4-O-allyl-2′-deoxyuridine (1.18 g, 4.40 mmol) was dissolved in anhydrouspyridine (37 mL), followed by addition of tert-butyldimethylsilylchloride (815 mg, 5.41 mmol) under argon. The reaction mixture wasstirred at room temperature for 20 h. After concentration, the residuewas dissolved in dichloromethane and impregnated on silica. The crudeproduct was purified by flash column chromatography(dichloromethane/methanol, 9:1) to obtain4-O-allyl-5′-O-(tert-butyldimethylsilyl)-2′-deoxyuridine.

To a mixture of 4-O-allyl-5′-O-(tert-butyldimethylsily)-2′-deoxyuridine(1.28 g, 3.35 mmol), tetrabutylammonium hydroxide (1.5 mL, 55-60% inwater) and sodium iodide (50.0 mg, 0.335 mmol) in dichloromethane/water(20 mL, 1:1) was added 1.0 M sodium hydroxide solution (10 mL) underargon. The reaction mixture was stirred for 10 minutes at roomtemperature, followed by addition of 2-nitrobenzyl bromide (1.45 g, 6.70mmol) in 10 mL dichloromethane over 5 minutes. After stirring at roomtemperature for 7 h, the reaction mixture was diluted withdichloromethane (150 mL). The organic layer was washed with brine (20mL) and dried over anhydrous sodium sulfate. The organic layer wasimpregnated on silica and purified by flash column chromatography(hexane/ethyl acetate, 1:1) to yield4-O-allyl-5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxyuridine.

4-O-allyl-5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxyuridine(1.55, 3.00 mmol) was dissolved in 7 N ammonia in ethanol (55 mL) andstirred in the sealed round bottom at 55° C. for 20 h. The reactionmixture was impregnated on silica and purified by flash columnchromatography (dichloromethane/methanol, 20:1) to give5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxycytidine.

5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2-deoxycytidine(2.22 g, 3.83 mmol) was dissolved in anhydrous tetrahydrofuran (40 mL)and cooled down to 0° C. Followed by addition of 1.0 Mtetrabutylammonium fluoride in tetrahydrofuran solution (4.20 mL, 4.20mmol) dropwise. The reaction mixture was stirred at room temperature for1 h. After reaction mixture was impregnated on silica and purified byflash column chromatography (dichloromethane/methanol, 8:2) to afford3′-O-(2-nitrobenzyl)-2′-deoxycytidine.

3′-O-(2-nitrobenzyl)-2′-deoxycytidine (14 mg, 1 Eq, 38 μmol) wasco-evaporated with pyridine (1 mL×3) and dried on high vac overnight. Itwas then dissolved in 1.5 mL of trimethylphosphate and 0.60 mL drypyridine and cooled in an ice bath under argon. A first aliquot of 6 uLof phosphoryl trichloride (18 mg, 11 μL, 3 Eq, 0.11 mmol) was added.Five minutes later, a second aliquot of 5 uL was added. The mixture wasstirred an additional 30 min. A solution of tetrabutylammonium hydrogendiphosphate (0.14 g, 4 Eq, 0.15 mmol) in 1.5 mL dry DMF was preparedunder Ar and cooled in an ice bath. This was added to the rxn mixturedropwise over 30 sec. Immediately, the preweighedN1,N1,N8,N8-tetramethylnaphthalene-1,8-diamine (33 mg, 4 Eq, 0.15 mmol)was added as a solid in one portion. The mixture was stirred for 30 minafter this addition and was quenched with 8 mL of cold 0.1 M TEABbuffer. The mixture was stirred in the ice bath for 10 min and thentransferred to a separatory funnel. The solution was extracted 1× with10 mL of EtOAc. The aq layer was transferred to a small tube for FPLCseparation which was conducted immediately after the EtOAc extraction.Final purification was by reverse phase HPLC.

Example 5

Detailed Procedures for 3′-O-Nitrobenzyl Deoxyguanosine:

2-Amino-6-chloro-9-[β-D-2′-deoxyribofiuranosyl]purine (1.00 g, 3.50mmol) and imidazole (715 mg, 10.50 mmol) were dissolved in anhydrousdimethylformate (18 mL), followed by addition of tert-butyldimethylsilylchloride (686 mg, 4.60 mmol). The reaction mixture was stirred at roomtemperature for 12 hours under argon. The dried residue was impregnatedon silica and purified by flash column chromatography(dichloromethane/methanol, 9:1) to afford2-amino-6-chloro-9-[β-D-5′-O-(tert-butyldimethylsilyl)-2′-deoxyribofuranosyl]purine.

2-Amino-6-chloro-9-[β-D-5′-O-(tert-butyldimethylsilyl)-2′-deoxyribofuranosyl]purine(1.19 g, 3.00 mmol) was dissolved in anhydrous tetrahydrofuran (8.0 mL),followed by addition of N,N-dimethylformamide dimethyl acetal (3.10 mL,18.0 mmol) at room temperature The reaction mixture was stirred at 40°C. for 3 h. The reaction mixture was impregnated on silica and purifiedby flash column chromatography (dichloromethane/methanol, 9:1) to obtain6-chloro-N²-[(dimethylaminomethylene)amino]-9-[β-D-5′-O-(tert-buyldimethylsilyl)-2′-deoxyribofuranosyl]purine.

6-Chloro-N²-[(dimethylaminomethylene)amino]-9-[β-D-5′-O-(tert-butyldimethylsilyl)-2′-deoxyribofuranosyl]purine(1.09 g, 2.40 mmol) was dissolved in anhydrous acetonitrile (3.5 mL),followed by addition of sodium hydride powder in mineral oil (60%) (122mg, 4.80 mmol) at 0° C. After stirring at room temperature for 1 h,solution of 2-nitrobenzyl bromide (1.04 g, 4.80 mmol) in anhydrousacetonitrile (1.5 mL) was added. After stirring at room temperature for2 h, the reaction mixture was filtrated. The filtrate was dried andobtained residue was dissolved in ethyl acetate (100 mL). The organiclayer was washed with saturated solution sodium bicarbonate (50 mL),brine (50 mL) and dried over anhydrous sodium sulfate. The organic layerwas impregnated on silica and purified by flash column chromatography(hexane/ethyl acetate 4:6) to give6-chloro-N²-[(dimethylaminomethylene)amino]-9-[p-D-5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxyribofuranosyl]purine.

6-Chloro-N²-[(dimethylaminomethylene)amino]-9-[β-D-5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxyribofuranosyl]purine(1.02 g, 1.73 mmol) was dissolved in anhydrous dimethylformate (15 mL),followed by addition of cesium acetate (996 mg, 5.19 mmol),1,4-diazabicyclo[2.2.2]octane (194 mg, 1.73 mmol) and triethylamine(0.72 mL, 5.19 mmol) under argon. The reaction mixture was stirred atroom temperature for 18 h. Acetic anhydride (5 mL) was added and stirredfor 0.5 h. The reaction mixture was quenched with water (100 mL) andextracted with ethyl acetate (100 mL). The organic layer was dried overanhydrous sodium sulfate and impregnated on silica and purified by flashcolumn chromatography (dichloromethane/methanol, 20:1) to give5′-O-(tert-butyldimethylsilyl)-N²-[(dimethylamino)methylene]-3′-O-(2-nitrobenzyl)-2′-deoxyguanosine.

5′-O-(tert-butyldimethylsiyl)-N²-[(dimethylamino)methylene]-3′-O-(2-nitrobenzyl)-2′-deoxyguanosine(732 mg, 1.28 mmol) was dissolved in anhydrous tetrahydrofuran (8 mL)under argon and cooled down to 0° C. Followed by addition of 1.0 Mtetrabutylammonium fluoride in tetrahydrofuran solution (2.56 mL, 2.56mmol) dropwise. The reaction mixture was stirred at room temperature for2 h. The reaction mixture was poured to cold water (50 mL) and extractedwith ethyl acetate (50 mL×2). Organic layers were combined and driedover anhydrous sodium sulfate. The organic layer was impregnated onsilica and purified by flash column chromatography(dichloromethane/methanol, 10:1) to yield

N²-[(dimethylamino)methylene]-3′-O-(2-nitrobenzyl)-2′-deoxyguanosine

N²-[(dimethylamino)methylene]-3′-O-(2-nitrobenzyl)-2′-deoxyguanosine (17mg, 1 Eq, 38 μmol) was co-evaporated with pyridine (1 mL×3) and dried onhigh vac overnight. It was then dissolved in 1.5 mL oftrimethylphosphate and 0.60 mL dry pyridine and cooled in an ice bathunder argon. A first aliquot of 6 uL of phosphoryl trichloride (18 mg,11 μL, 3 Eq, 0.11 mmol) was added. Five minutes later, a second aliquotof 5 uL was added. The mixture was stirred an additional 30 min. Asolution of tetrabutylammonium hydrogen diphosphate (0.14 g, 4 Eq, 0.15mmol) in 1.5 mL dry DMF was prepared under Ar and cooled in an ice bath.This was added to the rxn mixture dropwise over 30 sec. Immediately, thepreweighed N1,N1,N8,N8-tetramethylnaphthalene-1,8-diamine (33 mg, 4 Eq,0.15 mmol) was added as a solid in one portion. The mixture was stirredfor 30 min after this addition and was quenched with 8 mL of cold 0.1 MTEAB buffer. The mixture was stirred in the ice bath for 10 min and thentransferred to a separatory funnel. The solution was extracted 1× with10 mL of EtOAc. The aq layer was transferred to a small tube for FPLCseparation which was conducted immediately after the EtOAc extraction.Final purification was by reverse phase HPLC.

Example 6

Detailed Procedures for 3′-O-Nitrobenzyl Deoxythymidine:

Thymidine (2.50 g, 10.3 mmol) was suspended in dimethylformamide (60 mL)at room temperature under argon. To the suspension imidazole (4.22 g,61.9 mmol) and tert-butylchlorodimethylsilane (4.66 g, 30.1 mmol) wereadded. After stirring for 2 h, the reaction mixture was quenched withmethanol (8 mL) and diluted with ethyl acetate (200 mL). The organiclayer was washed with water (100 mL×2), saturated solution of sodiumbicarbonate (100 mL) and brine (100 mL). Organic layer was dried overanhydrous sodium sulfate, impregnated on silica and purified by flashcolumn chromatography (hexane/ethyl acetate, 8:2) to yield3′,5′-di-O-(tert-butyldimethylsilyl)thymidine.

3′,5′-di-O-(tert-butyldimethylsilyl)thymidine (4.34 g, 9.22 mmol) anddimethyl-4-aminopyridine (1.12 g, 9.22 mmol) were dissolved in anhydrousdichloromethane (140 mL). Triethylamine (5.14 mL, 36.9 mmol) was addedand the reaction mixture was cooled to 0° C. Benzyl chloride (3.21 mL,27.7 mmol) was added dropwise and allowed to warp up to roomtemperature. After stirring for 14 h, saturated solution of sodiumbicarbonate (80 mL) was added and layers were separate. The aqueouslayer was extracted with dichloromethane (200 mL×2). Combined organiclayers were washed with water (300 mL). The organic layer was dried overanhydrous sodium sulfate, impregnated on silica and purified by flashcolumn chromatography (hexane/ethyl acetate (8:2) to afford3-N-benzoyl-3′5′-di-O-(tert-butyldimethylsilyl)thymidine.

3-N-benzoyl-3′5′-di-O-(tert-butyldimethylsilyl)thymidine (3.03 g, 5.27mmol) was dissolved in dry tetrahydrofuran (50 mL). Triethylamine (1.98mL, 14.2 mmol) was added followed by the triethylammonium fluoridedihydrofluoride (2.32 mL, 14.2 mmol) under argon. The reaction mixturewas stirred at room temperature for 29 h, followed by concentration. Theresidue was dissolved in dichloromethane (100 mL) and washed with 1.5 Mammonium carbonate (75 mL), brine (75 mL). The organic layer was driedover anhydrous sodium sulfate, impregnated on silica and purified byflash column chromatography (dichloromethane/methanol, 9:1) to give3-N-benzoylthymidine.

3-N-benzoylthymidine (XX g, 4.40 mmol) was dissolved in anhydrouspyridine (37 mL), followed by addition oftert-butyldimethylsilylchloride (815 mg, 5.41 mmol) under argon. Thereaction mixture was stirred at room temperature for 20 h. Afterconcentration, the residue was dissolved in dichloromethane andimpregnated on silica. The crude product was purified by flash columnchromatography (dichloromethane/methanol, 9:1)3-N-benzoyl-5′-O-(tert-butyldimethylsilyl)thymidine.

To 3-N-benzoyl-5′-O-(tert-butyldimethylsilyl)thymidine (1.18 g, 2.56mmol) aqueous solution of tetrabutylammonium hydroxide (10 mL, 60%) wasadded, followed by sodium iodide (76.7 mg, 0.51 mmol), dichloromethane(10 mL), water (10 mL) and aqueous solution of 1M sodium hydroxide (10mL). This mixture was added dropwise to a solution of 2-nitrobenzylbromide (718 mg, 3.32 mmol) in dichloromethane (10 mL). After stirringreaction mixture at room temperature for 6 h, water (10 mL) was added.The aqueous layer was extracted with dichloromethane (50 mL×3). Theorganic layers were combined and washed with brine (100 mL), dried overanhydrous sodium sulfate. After filtration and concentration, theresidue was dissolved with ethyl acetate, impregnated on silica andpurified by flash column chromatography (hexane/ethyl acetate, 6:4) togive3-N-benzoyl-5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)thymidine.

3-N-benzoyl-5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)thymidine(1.24 g, 2.08 mmol) was dissolved in ethanol (15 mL), followed byaddition of 30% ammonium hydroxide solution (1.5 mL, 12.3 mmol). Thereaction mixture was stirred at room temperature for 1 h, impregnated onsilica and purified by flash column chromatography (hexane/ethylacetate, 6:4) to yield5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)thymidine.

5′-O-(tert-butyldimethylsilyl-3′-O-(2-nitrobenzyl)thymidine (61 mg, 0.9mmol) was dissolved in anhydrous tetrahydrofuran (12 mL) under argon andcooled down to 0° C. Followed by addition of 1.0 M tetrabutylammoniumfluoride in tetrahydrofuran solution (2.78 mL, 2.78 mmol) dropwise. Thereaction mixture was stirred at room temperature for 2 h. The reactionmixture was poured to cold water (50 mL) and extracted with ethylacetate (50 mL×3). The organic layers were combined and dried overanhydrous sodium sulfate. The organic layer was impregnated on silicaand purified by flash column chromatography (dichloromethane/methanol,10:1) to give 3′-O-(2-nitrobenzyl)thymidine.

3′-O-(2-nitrobenzyl)thymidine (15 mg, 1 Eq, 38 μmol) was co-evaporatedwith pyridine (1 mL×3) and dried on high vac overnight. It was thendissolved in 1.5 mL of trimethylphosphate and 0.60 mL dry pyridine andcooled in an ice bath under argon. A first aliquot of 6 uL of phosphoryltrichloride (18 mg, 11 μL, 3 Eq, 0.11 mmol) was added. Five minuteslater, a second aliquot of 5 uL was added. The mixture was stirred anadditional 30 min. A solution of tetrabutylammonium hydrogen diphosphate(0.14 g, 4 Eq, 0.15 mmol) in 1.5 mL dry DMF was prepared under Ar andcooled in an ice bath. This was added to the rxn mixture dropwise over30 sec. Immediately, the preweighedN1,N1,N8,N8-tetramethylnaphthalene-1,8-diamine (33 mg, 4 Eq, 0.15 mmol)was added as a solid in one portion. The mixture was stirred for 30 minafter this addition and was quenched with 8 mL of cold 0.1 M TEABbuffer. The mixture was stirred in the ice bath for 10 min and thentransferred to a separatory funnel. The solution was extracted 1× with10 mL of EtOAc. The aq layer was transferred to a small tube for FPLCseparation which was conducted immediately after the EtOAc extraction.Final purification was by reverse phase HPLC.

Example 6

Procedures for synthesizing Class II—Purine & Pyrimidine dNTP analogs.

Schemes for Synthesis of Class II Non-Peptide dNTP Analogs

Schemes for synthesis of Class II peptide-dNTP analogs

Example 7

Detailed Procedures for Synthesis of Class II Non-Peptide dATPConstructs:

9-[β-D-5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxyadenosine(0.24 mmol) was dissolved in anhydrous tetrahydrofurane (4 mL) at roomtemperature under argon. Sodium hydride (48 mg, 1.21 mmol) was added inone portion. After stirring for 1 h, reaction mixture was cooled to 0°C. and acyl chloride (3 eq, 0.723 mmol) was added drop wise. Thereaction mixture was stirred for 18 h at room temperature. The reactionmixture was poured into cold solution of saturated sodium bicarbonateand dichloromethane. The layers were separated. Organic layer was driedover anhydrous sodium sulfate, impregnated on silica and purified byflash column chromatography (hexane/ethyl acetate, 2:8) to yield desiredproduct.

N-substituted nucleoside (5.27 mmol) was dissolved in drytetrahydrofuran (50 mL). Triethylamine (1.98 mL, 14.2 mmol) was addedfollowed by the triethylammonium fluoride dihydrofluoride (2.32 mL, 14.2mmol) under argon. The reaction mixture was stirred at room temperaturefor 29 h, followed by concentration. The residue was dissolved indichloromethane (100 mL) and washed with 1.5 M ammonium carbonate (75mL×1), brine (75 mL). The organic layer was dried over anhydrous sodiumsulfate, impregnated on silica and purified by flash columnchromatography (dichloromethane/methanol, 9:1) to obtain N-substituted3′-O-(2-nitrobenzyl)-2′-deoxyadenosine.

N-substituted 3′-O-(2-nitrobenzyl)-2′-deoxyadenosine (1 Eq, 38 μmol) wasco-evaporated with pyridine (1 mL×3) and dried on high vac overnight. Itwas then dissolved in 1.5 mL of trimethylphosphate and 0.60 mL drypyridine and cooled in an ice bath under argon. A first aliquot of 6 uLof phosphoryl trichloride (18 mg, 11 μL, 3 Eq, 0.11 mmol) was added.Five minutes later, a second aliquot of 5 uL was added. The mixture wasstirred an additional 30 min. A solution of tetrabutylammonium hydrogendiphosphate (0.14 g, 4 Eq, 0.15 mmol) in 1.5 mL dry DMF was preparedunder Ar and cooled in an ice bath. This was added to the rxn mixturedropwise over 30 sec. Immediately, the preweighedN1,N1,N8,N8-tetramethylnaphthalene-1,8-diamine (33 mg, 4 Eq, 0.15 mmol)was added as a solid in one portion. The mixture was stirred for 30 minafter this addition and was quenched with 8 mL of cold 0.1 M TEABbuffer. The mixture was stirred in the ice bath for 10 min and thentransferred to a separatory funnel. The solution was extracted 1× with10 mL of EtOAc. The aq layer was transferred to a small tube for FPLCseparation which was conducted immediately after the EtOAc extraction.Final purification was by reverse phase HPLC.

Example 8

Detailed Procedure for Synthesis of Class II Non-Peptide dCTPConstructs:

5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxycytidine(524.3 mg, 1.10 mmol) and carboxylic acid (1.32 mmol) were dissolved inanhydrous dimethylformate under argon. N,N-Diisopropylethylamine (0.48mL, 2.74 mmol) and2-(3H-[1,2,3]triazolo[4,5-b]pyridin-3-yl)-1,1,3,3-tetramethylisouroniumhexafluorophosphate (V) (417 mg, 1.10 mmol) were added. The reactionmixture was stirred at room temperature for 18 h and diluted with ethylacetate (30 mL). The organic layer was washed with saturated sodiumbicarbonate solution (30 mL), dried over anhydrous sodium sulfate,impregnated on silica and purified by flash column chromatography(hexane/ethyl acetate, 4:6) to yield N-substituted5′-O-(tert-butyldimethylsilyl)-3-O-(2-nitrobenzyl)-2′-deoxycytidine.

N-substituted5′-O-(tert-butyldimethylsilyl)-3′-(2-nitrobenzyl)-2′-deoxycytidine (3.37g, 5.27 mmol) was dissolved in dry tetrahydrofuran (50 mL).Triethylamine (1.98 mL, 14.2 mmol) was added followed by thetriethylammonium fluoride dihydrofluoride (2.32 mL, 14.2 mmol) underargon. The reaction mixture was stirred at room temperature for 29 h,followed by concentration. The residue was dissolved in dichloromethane(100 mL) and washed with 1.5 M ammonium carbonate (75 mL×1), brine (75mL). The organic layer was dried over anhydrous sodium sulfate,impregnated on silica and purified by flash column chromatography(dichloromethane/methanol, 9:1) to obtain N-substituted3′-O-(2-nitrobenzyl)-2′-deoxycytidine.

N-substituted 3′-O-(2-nitrobenzyl)-2′-deoxycytidine (1 Eq, 38 μmol) wasco-evaporated with pyridine (1 mL×3) and dried on high vac overnight. Itwas then dissolved in 1.5 mL of trimethylphosphate and 0.60 mL drypyridine and cooled in an ice bath under argon. A first aliquot of 6 uLof phosphoryl trichloride (18 mg, 11 μL, 3 Eq, 0.11 mmol) was added.Five minutes later, a second aliquot of 5 uL was added. The mixture wasstirred an additional 30 min. A solution of tetrabutylammonium hydrogendiphosphate (0.14 g, 4 Eq, 0.15 mmol) in 1.5 mL dry DMF was preparedunder Ar and cooled in an ice bath. This was added to the rxn mixturedropwise over 30 sec. Immediately, the preweighedN1,N1,N8,N8-tetramethylnaphthalene-1,8-diamine (33 mg, 4 Eq, 0.15 mmol)was added as a solid in one portion. The mixture was stirred for 30 minafter this addition and was quenched with 8 mL of cold 0.1 M TEABbuffer. The mixture was stirred in the ice bath for 10 min and thentransferred to a separatory funnel. The solution was extracted 1× with10 mL of EtOAc. The aq layer was transferred to a small tube for FPLCseparation which was conducted immediately after the EtOAc extraction.Final purification was by reverse phase HPLC.

Example 9

Detailed Procedure for Synthesis of Peptide-dATP Conjugates:

9-[β-D-S′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxyadenosine(524.3 mg, 1.10 mmol) and 4-(pyridin-2-yldisulfaneyl)butanoic acid(302.7 mg, 1.32 mmol) were dissolved in anhydrous dimethylformate underargon. N,N-Diisopropylethylamine (0.48 mL, 2.74 mmol) and2-(3H-[1,2,3]triazolo[4,5-b]pyridin-3-yl)-1,1,3,3-tetramethylisouroniumhexafluorophosphate (V) (417 mg, 1.10 mmol) were added. The reactionmixture was stirred at room temperature for 18 h and diluted with ethylacetate (30 mL). The organic layer was washed with saturated sodiumbicarbonate solution (30 mL), dried over anhydrous sodium sulfate,impregnated on silica and purified by flash column chromatography(hexane/ethyl acetate, 4:6) to yieldN-(4-(pyridine-2-yldisulfaneyl)butanyryl)-5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxyadenosine.

N-(4-(pyridine-2-yldisulfaneyl)butanyryl)-S′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxyadenosine(3.75 g, 5.27 mmol) was dissolved in dry tetrahydrofuran (50 mL).Triethylamine (1.98 mL, 14.2 mmol) was added followed by thetriethylammonium fluoride dihydrofluoride (2.32 mL, 14.2 mmol) underargon. The reaction mixture was stirred at room temperature for 29 h,followed by concentration. The residue was dissolved in dichloromethane(100 mL) and washed with 1.5 M ammonium carbonate (75 mL×1), brine (75mL). The organic layer was dried over anhydrous sodium sulfate,impregnated on silica and purified by flash column chromatography(dichloromethane/methanol, 9:1) to obtainN-(4-(pyridine-2-yldisulfaneyl)butanyryl)-3′-O-(2-nitrobenzyl)-2′-deoxyadenosine.

N-(4-(pyridine-2-yldisulfaneyl)butanyryl)-3′-O-(2-nitrobenzyl)-2′-deoxyadenosine(23 mg, 1 Eq, 38 μmol) was co-evaporated with pyridine (1 mL×3) anddried on high vac overnight. It was then dissolved in 1.5 mL oftrimethylphosphate and 0.60 mL dry pyridine and cooled in an ice bathunder argon. A first aliquot of 6 uL of phosphoryl trichloride (18 mg,11 μL, 3 Eq, 0.11 mmol) was added. Five minutes later, a second aliquotof 5 uL was added. The mixture was stirred an additional 30 min. Asolution of tetrabutylammonium hydrogen diphosphate (0.14 g, 4 Eq, 0.15mmol) in 1.5 mL dry DMF was prepared under Ar and cooled in an ice bath.This was added to the rxn mixture dropwise over 30 sec. Immediately, thepreweighed N1,N1,N8,N8-tetramethylnaphthalene-1,8-diamine (33 mg, 4 Eq,0.15 mmol) was added as a solid in one portion. The mixture was stirredfor 10 min after this addition and was quenched with 8 mL of cold 0.1 MTEAB buffer. The mixture was stirred in the ice bath for 10 min and thentransferred to a separatory funnel. The solution was extracted 1× with10 mL of EtOAc. The aq layer was transferred to a small tube for FPLCseparation which was conducted immediately after the EtOAc extraction.Final purification was by reverse phase HPLC.

Example 10

Detailed Procedure for Synthesis of Peptide-dCTP Conjugates:

5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxycytidine(524.3 mg, 1.10 mmol) and 4-(pyridin-2-yldisulfaneyl)butanoic acid (1.32mmol) were dissolved in anhydrous dimethylformate under argon.N,N-Diisopropylethylamine (0.48 mL, 2.74 mmol) and2-(3H-[1,2,3]triazolo[4,5-b]pyridin-3-yl)-1,1,3,3-tetramethylisouroniumhexafluorophosphate (V) (417 mg, 1.10 mmol) were added. The reactionmixture was stirred at room temperature for 18 h and diluted with ethylacetate (30 mL). The organic layer was washed with saturated sodiumbicarbonate solution (30 mL), dried over anhydrous sodium sulfate,impregnated on silica and purified by flash column chromatography(hexane/ethyl acetate, 4:6) to yieldN-(4-(pyridine-2-yldisulfaneyl)butanyryl)-5′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxycytidine.

N-(4-(pyridine-2-yldisulfaneyl)butanyryl)-S′-O-(tert-butyldimethylsilyl)-3′-O-(2-nitrobenzyl)-2′-deoxycytidine(3.62 g, 5.27 mmol) was dissolved in dry tetrahydrofuran (50 mL).Triethylamine (1.98 mL, 14.2 mmol) was added followed by thetriethylammonium fluoride dihydrofluoride (2.32 mL, 14.2 mmol) underargon. The reaction mixture was stirred at room temperature for 29 h,followed by concentration. The residue was dissolved in dichloromethane(100 mL) and washed with 1.5 M ammonium carbonate (75 mL×1), brine (75mL). The organic layer was dried over anhydrous sodium sulfate,impregnated on silica and purified by flash column chromatography(dichloromethane/methanol, 9:1) to obtainN-(4-(pyridine-2-yldisulfaneyl)butanyryl)-3′-O-(2-nitrobenzyl)-2′-deoxycytidine.

N-(4-(pyridine-2-yldisilfaneyl)butanyryl)-3′-O-(2-nitrobenzyl)-2′-deoxycytidine(22 mg, 1 Eq, 38 μmol) was co-evaporated with pyridine (1 mL×3) anddried on high vac overnight. It was then dissolved in 1.5 mL oftrimethylphosphate and 0.60 mL dry pyridine and cooled in an ice bathunder argon. A first aliquot of 6 uL of phosphoryl trichloride (18 mg,11 μL, 3 Eq, 0.11 mmol) was added. Five minutes later, a second aliquotof 5 uL was added. The mixture was stirred an additional 30 min. Asolution of tetrabutylammonium hydrogen diphosphate (0.14 g, 4 Eq, 0.15mmol) in 1.5 mL dry DMF was prepared under Ar and cooled in an ice bath.This was added to the rxn mixture dropwise over 30 sec. Immediately, thepreweighed N1,N1,N8,N8-tetramethylnaphthalene-1,8-diamine (33 mg, 4 Eq,0.15 mmol) was added as a solid in one portion. The mixture was stirredfor 10 min after this addition and was quenched with 8 mL of cold 0.1 MTEAB buffer. The mixture was stirred in the ice bath for 10 min and thentransferred to a separatory funnel. The solution was extracted 1× with10 mL of EtOAc. The aq layer was transferred to a small tube for FPLCseparation which was conducted immediately after the EtOAc extraction.Final purification was by reverse phase HPLC.

Example 11

Procedures for synthesizing Class III—Purine & Pyrimidine dNTP analogs.

Schemes for synthesis of Class III dNTP analogs

Example 12

Detailed Procedures for Class III—Purine dNTP Analogs:

To a solution of phosgene (6.46 g, 2 Eq, 65.3 mmol) in dry Toluene (100mL) at 23° C. was added (2-nitrophenyl) methanol (5.00 g, 1 Eq, 32.6mmol) in 20 mL dry THF. The reaction was stirred for 24 hours at 23° C.The reaction was then concentrated to dryness under a vacuum trappedwith a NaOH aqueous solution. The amber oil that remained was useddirectly in the reaction without further purification.

To a solution of9-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tert-butyldimethylsilyl)oxy)methyl) tetrahydrofuran-2-yl)-9H-purin-6-amine (12.2 g, 1.1 Eq, 25.5mmol) in dry DMF (100 mL) at 0° C. was added N,N-diisopropylethylamine(3.60 g, 4.9 mL, 1.2 Eq, 27.8 mmol). The reaction was stirred for 30minutes and then charged with 2-nitrobenzyl carbonochloridate (5.00 g, 1Eq, 23.2 mmol) slowly drop-wise over 30 minutes keeping the temperaturebelow 5 C. The reaction was then allowed to warm to room temperature andstirred overnight. The reaction was poured into a cooled solution of 5%Na2CO3 and EtOAc. The EtOAc layer was dried with sodium sulfate and thenconcentrated to dryness. The crude product was chromatographed on silicagel using hexane/EtOAc mixtures to give a purified product which wasused in the next reaction.

2-nitrobenzyl(9-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tert-butyldimethylsilyl)oxy)methyl) tetrahydrofuran-2-yl)-9H-purin-6-yl)carbamate (5.00 g, 1 Eq,7.59 mmol) was dissolved in THF at room temperature and then cooled toOC under a blanket of dry Argon. The mixture was then charged withtetrabutylammonium fluoride (4.96 g, 2.5 Eq, 19.0 mmol). The mixture wasstirred for 2 hours at OC and then warmed to 23 C for 1 hour. Thesolution was poured into a cold solution of 10% NaHCO₃ and extractedwith DCM. The DCM layer was concentrated and the crude product purifiedon silica gel eluting with 5-50% DCM/MeOH to afford the product suitablefor thiophosphorylation.

2-nitrobenzyl(9-((2R,4S,5R)-4-hydroxy-5-(hydroxymethyl)tetrahydrofuran-2-yl)-9H-purin-6-yl)carbamate(28 mg, 1 Eq, 65 μmol) was dissolved in trimethyl phosphate (1.5 mL) and0.60 mL of dry pyridine and cooled in an ice bath under Argon. A firstaliquot of phosphoryl trichloride (30 mg, 18 μL, 3 Eq, 0.20 mmol) wasadded. Five minutes later, a second aliquot of 10 uL was added. Themixture was stirred an additional 30 min. A solution oftetrabutylammonium hydrogen diphosphate (0.23 g, 4 Eq, 0.26 mmol) in dryDMF was prepared under Ar and cooled in an ice bath. This was added tothe reaction mixture dropwise over 30 seconds at rxn t=35 min.Immediately the pre-weighedN1,N1,N8,N8-tetramethyl-naphthalene-1,8-diamine (56 mg, 4 Eq, 0.26 mmol)was added as a solid in one portion. The mixture was stirred for 30 minafter this addition and was quenched with 8 mL of cold 0.1 M TEABbuffer. The mixture was stirred for 30 min and then transferred to aseparatory funnel. The solution was extracted 1× with 10 mL of EtOAc.The aq. layer was transferred to a small tube for FPLC separation.

Example 13

Detailed Procedures for Class III—Pyrimidine dNTP Analogs:

To a solution of4-amino-1-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tert-butyldimethylsilyl)oxy)-methyl)tetrahydrofuran-2-yl)pyrimidin-2(1H)-one(9.30 g, 1.1 Eq, 20.4 mmol) in dry DMF (100 mL) at 0° C. was addedN,N-diisopropylethylamine (2.88 g, 3.9 mL, 1.2 Eq, 22.3 mmol). Thereaction was stirred for 30 minutes and then charged with 2-nitrobenzylcarbonochloridate (4.00 g, 1 Eq, 18.6 mmol) slowly dropwise over 30minutes keeping the temperature below 5 C. The reaction was then allowedto warm to room temperature and stirred overnight. The reaction waspoured into a cooled solution of 5% Na2CO3 and EtOAc. The EtOAc layerwas dried with sodium sulfate and then concentrated to dryness. Thecrude product was chromatographed on silica gel using hexane/EtOAcmixtures to give a purified product which was used in the next reaction.

2-nitrobenzyl(1-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tert-butyldimethylsilyl)oxy)methyl)-tetrahydrofuran-2-yl)-2-oxo-1,2-dihydropyrimidin-4-yl)carbamate(5.00 g, 1 Eq, 7.88 mmol) was dissolved in 25 mL THF at room temperatureand then cooled to OC under a blanket of dry Argon. The mixture was thencharged with tetrabutylammonium fluoride (5.15 g, 2.5 Eq, 19.7 mmol).The mixture was stirred for 2 hours at OC and then warmed to 23 C for 1hour. The solution was poured into a cold solution of 10% NaHCO3 andextracted with DCM. The DCM layer was concentrated and the crude productpurified on silica gel eluting with 5-50% DCM/MeOH to afford the productsuitable for triphosphorylation.

2-nitrobenzyl(1-((2R,4S,5R)-4-hydroxy-5-(hydroxymethyl)tetrahydrofuran-2-yl)-2-oxo-1,2-dihydropyrimidin-4-yl)carbamate(35.0 mg, 1 Eq, 86.1 μmol) was dissolved in trimethyl phosphate (1.5 mL)and 0.60 mL of dry pyridine and cooled in an ice bath under Argon. Afirst aliquot of phosphoryl trichloride (39.6 mg, 3 Eq, 258 μmol) wasadded. Five minutes later a second aliquot of 10 uL was added. Themixture was stirred an additional 30 min. A solution oftetrabutylammonium hydrogen diphosphate (311 mg, 4 Eq, 345 μmol) in dryDMF was prepared under Ar and cooled in an ice bath. This was added tothe reaction mixture dropwise over 30 seconds at rxn t=35 min.Immediately the pre-weighedN1,N1,N8,N8-tetramethylnaphthalene-1,8-diamine (73.8 mg, 4 Eq, 345 μmol)was added as a solid in one portion. The mixture was stirred for 30 minafter this addition and was quenched with 8 mL of cold 0.1 M TEABbuffer. The mixture was stirred for 30 min and then transferred to aseparatory funnel. The solution was extracted 1× with 10 mL of EtOAc.The aq. layer was transferred to a small tube for FPLC separation.

Example 14

Procedures for synthesizing Class IV—dCTP analog.

Scheme for Synthesis of Class IV dCTP Analog

Detailed Procedures for Class IV dCTP Analog:

Methyl 3,4,5-trihydroxybenzoate (10 g, 1 Eq, 54 mmol) was dissolved in50 ml of acetone. Sodium iodide (0.81 g, 0.1 Eq, 5.4 mmol) and potassiumcarbonate (38 g, 5 eq, 270 mmol) were added as solids at ambienttemperature. 1-(Chloromethyl)-2-nitrobenzene (34 g, 3.6 Eq, 0.20 mol)was added dropwise as a solution in 40 mL of acetone over 10 minutes.The mixture was stirred for 1 hour and then heated to 50° C. for 6 hr.The mixture was cooled to ambient temperature and the bulk of thesolvent was removed on a rotovap. The residue was suspended in 200 mL ofEtOAc and this was washed successively with 200 mL portions of water andsaturated aqueous NaCl solution. The EtOAc layer was dried with sodiumsulfate and evaporated. The crude product was chromatographed on silicausing hexane/EtOAc mixtures to give a purified product which can be usedin the next reaction.

Methyl 3,4,5-tris((2-nitrobenzyl)oxy)benzoate (25 g, 1 Eq, 42 mmol) wasdissolved in 300 mL of THF. Aqueous 2M NaOH (105 mL, 210 mmol) was addedand the mixture was stirred at ambient T for 18 h. The bulk of the THFwas removed on a rotovap and the residue was acidified slowly with 6 MHCl to a pH of 1 or less. The resulting solid was filtered and washedwell with water and dried on the filter funnel for 5 hours and thenunder high vacuum for 18 h. The product was taken into the next reactionwithout further purification.

The4-amino-1-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tert-butyldimethylsilyl)oxy)methyl) tetrahydrofuran-2-yl)pyrimidin-2(1H)-one (12 g, 1 Eq, 26mmol) and 3,4,5-tris((2-nitrobenzyl) oxy) benzoic acid (15 g, 1 Eq, 26mmol) were dissolved in 50 mL of dry DMF at ambient temperature under anargon atmosphere. N-Ethyl-N-isopropylpropan-2-amine (5.1 g, 6.8 mL, 1.5Eq, 39 mmol) was added followed by a solution of1-((dimethylamino)(dimethyliminio)methyl)-1H-[1,2,3]triazolo[4,5-b]pyridine3-oxide hexafluorophosphate(V) (12 g, 1.2 Eq, 31 mmol) in 10 mL of dryDMF, added dropwise over 5 minutes at ambient temperature. The mixturewas stirred at ambient temperature for 18 h. The mixture was dissolvedin 300 mL of EtOAc and this was washed successively with 200 mL portionsof water (2×) and saturated aqueous NaCl solution. the EtOAc was driedwith sodium sulfate and evaporated first on a rotovap and then underhigh vacuum for 18 h. The residue was chromatographed on silica usingmixtures of dichloromethane and methanol to give the desired product asa colorless foam.

N-(1-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tert-butyldimethylsilyl)oxy) methyl)tetrahydrofuran-2-yl)-2-oxo-1,2-dihydropyrimidin-4-yl)-3,4,5-tris((2-nitrobenzyl) oxy) benzamide (5 g, 1 Eq, 5 mmol) was dissolved in25.0 mL of dry THF at ambient T under Argon. triethylamine (4 g, 6 mL, 8Eq, 4e+1 mmol) was added rapidly followed by triethylammonium fluoridedihydrofluoride (5 g, 5 mL, 6 Eq, 3e+1 mmol) also added rapidly atambient temperature. The mixture was stirred at ambient T for 24 h.Silica gel (20 g) was added and the mixture was evaporated on a rotovapto a fine powder and then loaded onto a 100 g silica column and elutedwith mixtures of dichloromethane and methanol to give the nucleoside asa slightly yellow foam.

N-(1-((2R,4S,5R)-4-hydroxy-5-(hydroxymethyl)tetrahydrofuran-2-yl)-2-oxo-1,2-dihydropyrimidin-4-yl)-3,4,5-tris((2-nitrobenzyl)oxy)benzamide(30 mg, 1 Eq, 38 μmol) was co-evaporated with pyridine (1 mL×3) anddried on high vac overnight. It was then dissolved in 1.5 mL oftrimethylphosphate and 0.60 mL dry pyridine and cooled in an ice bathunder argon. A first aliquot of 6 uL of phosphoryl trichloride (18 mg,11 μL, 3 Eq, 0.11 mmol) was added. Five minutes later, a second aliquotof 5 uL was added. The mixture was stirred an additional 30 min. Asolution of tetrabutylammonium hydrogen diphosphate (0.14 g, 4 Eq, 0.15mmol) in 1.5 mL dry DMF was prepared under Ar and cooled in an ice bath.This was added to the rxn mixture dropwise over 30 sec. Immediately, thepreweighed N1,N1,N8,N8-tetramethylnaphthalene-1,8-diamine (33 mg, 4 Eq,0.15 mmol) was added as a solid in one portion. The mixture was stirredfor 30 min after this addition and was quenched with 8 mL of cold 0.1 MTEAB buffer. The mixture was stirred in the ice bath for 10 min and thentransferred to a separatory funnel. The solution was extracted 1× with10 mL of EtOAc. The aq layer was transferred to a small tube for FPLCseparation which was conducted immediately after the EtOAc extraction.Final purification was by reverse phase HPLC.

Example 15

Procedures for synthesizing Class V Peptide & non-Peptide analogs.

Scheme for Synthesis of Peptide—Thymidine dNTP Conjugates

Scheme for Synthesis of Non-Peptide—Thymidine dNTP Conjugates

Example 16

Detailed Procedures for Peptide—dTTP Analogs:

4-(bromomethyl)benzenethiol (5.00 g, 1 Eq, 24.6 mmol) was dissolved inMethanol (50 mL) and cooled to OC. The mixture was then charged with1,2-di(pyridin-2-yl) disulfane (5.42 g, 1 Eq, 24.6 mmol) ard stirred atOC for 18 hours. The reaction was then concentrated directly andpurified on silica get eluting with hexanes/Ethyl acetate (0-100% EtOAc)to afford the product as a white solid which was used directly in thenext reaction.

1-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tert-butyldimethylsilyl)oxy)methyl)tetrahydrofuran-2-yl)-5-methylpyrimidine-2,4(1H,3H)-dione(5.00 g, 1 Eq, 10.6 mmol) was dissolved in 50 mL DMF and then cooled toOC. The reaction was stirred 30 minutes at OC and then charged withsodium hydride (306 mg, 1.2 Eq, 12.7 mmol). The reaction was thenallowed to stir an additional 30 minutes at OC and then warmed to 23 C.The mixture was then charged with2-((4-(bromomethyl)phenyl)-disulfaneyl)-pyridine (3.32 g, 1 Eq, 10.6mmol) and stirring was continued for an additional 2 hours at 23 C. Thereaction was then poured into a cold solution of 10% NaHCO₃ and DCM. TheDCM layer was separated and dried over sodium sulfate and concentratedto dryness. The mixture was then purified on silica gel eluting with0-20% DCM/methanol to afford the desired product.

1-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tert-butyldimethylsilyl)oxy)-methyl)tetrahydrofuran-2-yl)-5-methyl-3-(4-(pyridin-2-yldisulfaneyl)benzyl)pyrimidine-2,4(1H,3H)-dione(2.00 g, 1 Eq, 2.85 mmol) was dissolved in THF and cooled to OC. Themixture was then charged with tetrabutylammonium fluoride (2.23 mg, 3Eq, 8.55 mmol) at OC. The reaction was kept stirring for 2 hours at OCand then warmed to 23 C for an additional hour. The reaction was thencooled again to OC and charged into a pre-cooled solution of 10% NaHCO₃and DCM at OC. The DCM layer was then separated and dried over sodiumsulfate and concentrated and purified on silica gel eluting with 5-50%DCM/methanol to afford the pure product.

1-((2R,4S,5R)-4-hydroxy-5-(hydroxymethyl)tetrahydrofuran-2-yl)-5-methyl-3-(4-(pyridin-2-yldisulfaneyl)benzyl)pyrimidine-2,4(1H,3H)-dione(5.00 g, 1 Eq, 10.6 mmol) was dissolved in 20 mL of THF at 23 C. Themixture was then charged with triethylamine (1.07 g, 1.5 mL, 1 Eq, 10.6mmol) and cooled to OC. The mixture was then charged with TBS-Cl (1.59g, 1 Eq, 10.6 mmol) and stirring was continued for an additional 2 hoursat OC. The mixture was then charged to a pre-cooled mixture of 10% aq.NaCl and DCM. The DCM layer was dried over sodium sulfate, concentratedand dried to afford an amber oil. The crude product was then purified onsilica gel eluting with 5-50% DCM/methanol to afford the pure product.

1-((2R,4S,5R)-5-(((tert-butyldimethylsilyl)oxy)methyl)-4-hydroxytetrahydrofuran-2-yl)-5-methyl-3-(4-(pyridin-2-yldisulfaneyl)benzyl)pyrimidine-2,4(1H,3H)-dione(2.00 g, 1 Eq, 3.40 mmol) was dissolved in 20 mL DMF and then cooled toOC. The mixture was then charged with sodium hydride (98.0 mg, 1.2 Eq,4.08 mmol) and stirring was continued for an additional 30 minutes atOC. The reaction was then charged with 1-(bromomethyl)-2-nitrobenzene(735 mg, 1 Eq, 3.40 mmol) and stirring was continued for an additional 1hour at OC. The reaction was then charge into a pre-cooled mixture 10%NaCl and EtOAc. The EtOAc layer was separated and dried over sodiumsulfate and concentrated to dryness. The crude product was purified onsilica gel eluting with 0-50 hexanes/EtOAc to afford the desiredproduct.

1-((2R,4S,5R)-5-(((tert-butyldimethylsilyl)oxy)methyl)-4-((2-nitrobenzyl)oxy)tetrahydrofuran-2-yl)-5-methyl-3-(4-(pyridin-2-yldisulfaneyl)benzyl)pyrimidine-2,4(1H,3H)-dione(1.00 g, 1 Eq, 1.38 mmol) was dissolved in THF and cooled to OC. Thereaction was then charged with TBAF (362 mg, 1 Eq, 1.38 mmol) at OC andstirred for 1 hour at OC then then warmed to rt over the course of 2hours. The mixture was then poured into a pre-cooled solution of 10%NaHCO3 and DCM. The DCM layer was separated and dried over sodiumsulfate. The crude product was purified on silica gel eluting with 5-25%DCM/methanol to afford the desired product.

4-(pyridin-2-yl)benzyl(1-((2R,4S,5R)-5-(hydroxymethyl)-4-((2-nitrobenzyl)oxy)tetrahydrofuran-2-yl)-2-oxo-1,2-dihydropyrimidin-4-yl)carbamate(35.0 mg, 1 Eq, 61.0 μmol) was dissolved in trimethyl phosphate (1.5 mL)and 0.60 mL of dry pyridine and cooled in an ice bath under Argon. Afirst aliquot of phosphoryl trichloride (39.6 mg, 3 Eq, 258 μmol) wasadded. Five minutes later a second aliquot of 10 uL was added. Themixture was stirred an additional 30 min. A solution oftetrabutylammonium hydrogen diphosphate (311 mg, 4 Eq, 345 μmol) in dryDMF was prepared under Ar and cooled in an ice bath. This was added tothe reaction mixture dropwise over 30 seconds at rxn t=35 min.Immediately the pre-weighedN1,N1,N8,N8-tetramethylnaphthalene-1,8-diamine (73.8 mg, 4 Eq, 345 μmol)was added as a solid in one portion. The mixture was stirred for 30 minafter this addition and was quenched with 8 mL of cold 0.1 M TEABbuffer. The mixture was stirred for 30 min and then transferred to aseparatory funnel. The solution was extracted 1× with 10 mL of EtOAc.The aq. layer was transferred to a small tube for FPLC separation.

Example 17

Detailed Procedures for Non-Peptide—dTTP Analog:

1-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tert-butyldimethylsilyl)-oxy)methyl)tetrahydrofuran-2-yl)-5-methylpyrimidine-2,4(1H,3H)-dione(5.00 g, 1 Eq, 10.6 mmol) was dissolved in 50 mL DMF and then cooled toOC. The reaction was stirred 30 minutes at OC and then charged withsodium hydride (306 mg, 1.2 Eq, 12.7 mmol). The reaction was allowed tostir an additional 30 minutes at OC and then warmed to 23 C. The mixturewas then charged with (bromomethyl)benzene (1.82 g, 1 Eq, 10.6 mmol) andstirring was continued for an additional 2 hours at 23 C. The reactionwas then poured into a cold solution of 10% NaHCO₃ and DCM. The DCMlayer was separated and dried over sodium sulfate and concentrated todryness. The mixture was then purified on silica gel eluting with 0-20%DCM/methanol to afford the desired product.

3-benzyl-1-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tert-butyldimethylsilyl)oxy)-methyl)tetrahydrofuran-2-yl)-5-methylpyrimidine-2,4(1H,3H)-dione(4.00 g, 1 Eq, 7.13 mmol) was dissolved in THF and cooled to OC. Themixture was then charged with tetrabutylammonium fluoride (3.72 g, 2 Eq,14.26 mmol) at OC. The reaction was kept stirring for 2 hours at OC andthen warmed to 23 C for an additional hour. The reaction was then cooledagain to OC and charged into a pre-cooled solution of 10% NaHCO₃ and DCMat OC. The DCM layer was then separated and dried over sodium sulfateand concentrated and purified on silica gel eluting with 5-50%DCM/methanol to afford the pure product.

3-benzyl-1-((2R,4S,5R)-4-hydroxy-5-(hydroxymethyl)tetrahydrofuran-2-yl)-5-methylpyrimidine-2,4(1H,3H)-dione(1.00 g, 1 Eq, 3.01 mmol) was dissolved in 20 mL of THF at 23 C. Themixture was then charged with triethylamine (304 mg, 0.42 mL, 1 Eq, 3.01mmol) and cooled to OC. The mixture was then charged with TBS-Cl (453mg, 1 Eq, 3.01 mmol) and stirring was continued for an additional 2hours at OC. The mixture was then charged to a pre-cooled mixture of 10%aq. NaCl and DCM. The DCM layer was dried over sodium sulfate,concentrated and dried to afford an amber oil. The crude product wasthen purified on silica gel eluting with 5-50% DCM/methanol to affordthe pure product.

3-benzyl-1-((2R,4S,5R)-5-(((tert-butyldimethylsilyl)oxy)methyl)-4-hydroxytetrahydrofuran-2-yl)-5-methylpyrimidine-2,4(1H,3H)-dione(6.00 g, 1 Eq, 13.4 mmol) was dissolved in 20 mL DMF and then cooled toOC. The mixture was then charged with sodium hydride (387 mg, 1.2 Eq,16.1 mmol) and stirring was continued for an additional 30 minutes atOC. The reaction was then charged with 1-(bromomethyl)-2-nitrobenzene(2.90 g, 1 Eq, 13.4 mmol) and stirring was continued for an additional 1hour at OC. The reaction was then charge into a pre-cooled mixture 10%NaCl and EtOAc. The EtOAc layer was separated and dried over sodiumsulfate and concentrated to dryness. The crude product was purified onsilica gel eluting with 0-50 hexanes/EtOAc to afford the desiredproduct.

3-benzyl-1-(2R,4S,5R)-5-(((tert-butyldimethylsilyl)oxy)methyl)-4-((2-nitrobenzyl)oxy)-tetrahydrofuran-2-yl)-5-methylpyrimidine-2,4(1H,3H)-dione(1.50 g, 1 Eq, 2.58 mmol) was dissolved in THF and cooled to OC. Thereaction was then charged with TBAF (674 mg, 1 Eq, 2.58 mmol) at OC andstirred for 1 hour at OC then then warmed to rt over the course of 2hours. The mixture was then poured into a pre-cooled solution of 10%NaHCO₃ and DCM. The DCM layer was separated and dried over sodiumsulfate. The crude product was purified on silica gel eluting with 5-25%DCM/methanol to afford the desired product.

benzyl(1-((2R,4S,5R)-5-(hydroxymethyl)-4-((2-nitrobenzyl)oxy)tetrahydrofuran-2-yl)-2-oxo-1,2-dihydropyrimidin-4-yl)carbamate(35.0 mg, 1 Eq, 70.5 μmol) was dissolved in trimethyl phosphate (1.5 mL)and 0.60 mL of dry pyridine and cooled in an ice bath under Argon. Afirst aliquot of phosphoryl trichloride (39.6 mg, 3 Eq, 258 μmol) wasadded. Five minutes later a second aliquot of 10 uL was added. Themixture was stirred an additional 30 min. A solution oftetrabutylammonium hydrogen diphosphate (311 mg, 4 Eq, 345 μmol) in dryDMF was prepared under Ar and cooled in an ice bath. This was added tothe reaction mixture dropwise over 30 seconds at rxn t=35 min.Immediately the preweighedN1,N1,N8,N8-tetramethylnaphthalene-1,8-diamine (73.8 mg, 4 Eq, 345 μmol)was added as a solid in one portion. The mixture was stirred for 30 minafter this addition and was quenched with 8 mL of cold 0.1 M TEABbuffer. The mixture was stirred for 30 min and then transferred to aseparatory funnel. The solution was extracted 1× with 10 mL of EtOAc.The aq. layer was transferred to a small tube for FPLC separation.

Example 18

Procedures for synthesizing Class VI peptide & non-peptide dGTP analogs.

Schemes for Synthesis of Class VI— dGTP Constructs

Detailed Procedures for Non-Peptide dGTP Analogs

2-Amino-9-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tertbutyldimethylsilyl)oxy) methyl) tetrahydrofuran-2-yl)-1,9-dihydro-6H-purin-6-one (0.50 g, 1Eq, 1.0 mmol) was dissolved in 5.0 mL of dry dimethylacetamide underargon. oxirane (0.13 g, 3 Eq, 3.0 mmol) was added at ambient temperaturefollowed by sodium hydroxide (40 mg, 1 Eq, 1.0 mmol) as a solid. Themixture was stirred at ambient temperature for 4 h. The mixture wasdiluted with 50 mL of EtOAc and this was washed successively with 100 mLof water and 100 mL of brine. The EtOAc layer was dried with sodiumsulfate and evaporated to leave a yellow oil. This was chromatographedon 40 g of silica using dichloromethane/methanol mixtures as eluent toprovide a white foam.

2-Amino-9-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tert-butyldimethylsilyl)oxy)methyl)tetrahydrofuran-2-yl)-1-(2-hydroxyethyl)-1,9-dihydro-6H-purin-6-one(200 mg, 1 Eq, 370 μmol) was suspended in 5 mL of dry pyridine atambient temperature under argon. The chloroformate (1 equiv), added as asolid. The mixture was heated to 95 C for 8 h and cooled to ambient T.The solvent was removed in vacuo and the residue was diluted with 50 mLof EtOAc and this was washed successively with 50 mL of water and 50 mLof brine. The EtOAc layer was dried with sodium sulfate and evaporatedto leave a yellow oil. This was chromatographed on 40 g of silica usingdichloromethane/methanol mixtures as eluent to provide a white foam.

The alcohol starting material was dissolved in dry THF at ambienttemperature under argon. Two equivalents of triethylamine were added. Asolution of the acyl chloride in THF was added dropwise at ambienttemperature and the mixture was stirred for 18 h. The solvent wasremoved in vacuo and the residue was diluted with 50 mL of EtOAc andthis was washed successively with 50 mL of water and 50 mL of brine. TheEtOAc layer was dried with Na2SO4 and evaporated to leave a light brownsolid. This was chromatographed on a silica column usingdichloromethane/methanol mixtures as eluent to provide the correspondingester.

The bis-silyl ether (1 eq) was dissolved in dry THF at ambient T underArgon. Triethylamine (8 Eq) was added rapidly followed bytriethylammonium fluoride dihydrofluoride (6 Eq) also added rapidly atambient temperature. The mixture was stirred at ambient temperature for24 h. Silica gel was added and the mixture was evaporated on a rotovapto a fine powder and then loaded onto a silica column and eluted withmixtures of dichloromethane and methanol to give the nucleoside as aslightly yellow foam.

The nucleoside was co-evaporated with pyridine (1 mL×3) and dried onhigh vac overnight. It was then dissolved in 1.5 mL oftrimethylphosphate and 0.60 mL dry pyridine and cooled in an ice bathunder argon. A first aliquot of phosphoryl trichloride (1.5 eq) wasadded. Five minutes later, a second aliquot of 1.5 eq was added. Themixture was stirred an additional 30 min. A solution oftetrabutylammonium hydrogen diphosphate (4 Eq) in 1.5 mL dry DMF wasprepared under Ar and cooled in an ice bath. This was added to the rxnmixture dropwise over 30 sec. Immediately, the preweighedN1,N1,N8,N8-tetramethylnaphthalene-1,8-diamine (4 Eq) was added as asolid in one portion. The mixture was stirred for 30 min after thisaddition and was quenched with 8 mL of cold 0.1 M TEAB buffer. Themixture was stirred in the ice bath for 10 min and then transferred to aseparatory funnel. The solution was extracted 1× with 10 mL of EtOAc.The aq layer was transferred to a small tube for FPLC separation whichwas conducted immediately after the EtOAc extraction. Final purificationwas by reverse phase HPLC.

Detailed Procedures for Peptide—dG IP Analogs:

2-Amino-9-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tert-butyldimethylsilyl)oxy)methyl)tetrahydrofuran-2-yl)-1,9-dihydro-6H-purin-6-one(1.00 g, 1 Eq, 2.02 mmol) was dissolved in 30 mL of dryN,N-dimethylacetamide under argon. 4-bromobutanoic acid (337 mg, 1 Eq,2.02 mmol) was added at ambient temperature followed by sodium hydroxide(161 mg, 2 Eq, 4.03 mmol), added as a solid. The mixture was heated to80 C and stirred for 12 h. Th mixture was cooled to ambient temperatureand diluted with 100 mL of EtOAc and this was washed successively with50 mL of water and 50 mL of brine. The EtOAc layer was dried with Na2SO4and evaporated to leave a light brown solid. This was chromatogaphed ona silica column using dichloromethane/methanol mixtures as eluent toprovide4-(2-amino-9-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tert-butyldimethylsilyl)oxy)methyl)tetrahydrofuran-2-yl)-6-oxo-6,9-dihydro-1H-purin-1-yl)butanoicacid as a white solid.

4-(2-amino-9-((2R,4S,5R)-4-((tert-butyldimethylsilyl)oxy)-5-(((tert-butyldimethylsilyl)oxy)methyl)tetrahydrofuran-2-yl)-6-oxo-6,9-dihydro-1H-purin-1-yl)butanoicacid (1 Eq) was suspended in 5 mL of dry pyridine at ambient temperatureunder argon. The chloroformate (1 equiv), added as a solid. The mixturewas heated to 95 C for 8 h and cooled to ambient T. The solvent wasremoved in vacuo and the residue was diluted with 50 mL of EtOAc andthis was washed successively with 50 mL of water and 50 mL of brine. TheEtOAc layer was dried with sodium sulfate and evaporated to leave ayellow oil. This was chromatogaphed on 40 g of silica usingdichloromethane/methanol mixtures as eluent to provide a white foam.

The carboxylic acid was dissolved or suspended in dry THF at ambient T.To this solution was added 1.3 eq of triethylamine followed by 1.1 eq ofdiphenylphosphoryl azide. The mixture was heated to reflux for 20 h andcooled to ambient T. Silica gel was added to the mixture and the solventwere evaporated to give a fine powder. This was loaded onto a column ofsilica gel and eluted with mixtures of EtOAc and dichloromethane to givethe desired isocyanate as a colorless oil.

The bis-silyl ether (1 eq) was dissolved in dry THF at ambient T underArgon. Triethylamine (8 Eq) was added rapidly followed bytriethylammonium fluoride dihydrofluoride (6 Eq) also added rapidly atambient temperature. The mixture was stirred at ambient temperature for24 h. Silica gel was added and the mixture was evaporated on a rotovapto a fine powder and then loaded onto a silica column and eluted withmixtures of dichloromethane and methanol to give the nucleoside as aslightly yellow foam.

The nucleoside was co-evaporated with pyridine (1 mL×3) and dried onhigh vac overnight. It was then dissolved in 1.5 mL oftrimethylphosphate and 0.60 mL dry pyridine and cooled in an ice bathunder argon. A first aliquot of phosphoryl trichloride (1.5 eq) wasadded. Five minutes later, a second aliquot of 1.5 eq was added. Themixture was stirred an additional 30 min. A solution oftetrabutylammonium hydrogen diphosphate (4 Eq) in 1.5 mL dry DMF wasprepared under Ar and cooled in an ice bath. This was added to the rxnmixture dropwise over 30 sec. Immediately, the preweighedN1,N1,N8,N8-tetramethylnaphthalene-1,8-diamine (4 Eq) was added as asolid in one portion. The mixture was stirred for 30 min after thisaddition and was quenched with 8 mL of cold 0.1 M TEAB buffer. Themixture was stirred in the ice bath for 10 min and then transferred to aseparatory funnel. The solution was extracted 1× with 10 mL of EtOAc.The aq layer was transferred to a small tube for FPLC separation whichwas conducted immediately after the EtOAc extraction. Final purificationwas by reverse phase HPLC.

Decaging of a 3′-O-(2-nitro-benzyl)-dATP and homopolymer synthesis isshown in FIGS. 20 and 21 . 25 uM of 3′-O-(2-nitro-benzyl)-dATP (TriLinkTechnologies, San Diego, Calif.) was mixed with 1 uM of anoligonucleotide initiator(5′-biotin-TTTTTTGGCCTTTTUTAATAATAATAATAATTTTT, IDT, SEQ ID NO. 4) with1×TdT reaction Buffer (Thermo-Fisher), 2 U/uL of terminaldeoxynucleotidyl transferase (Thermo-Fisher), and 0.002 U/uL inorganicpyrophosphatase (Thermo-Fisher). The reaction volume was the subjectedto 20-22 mW/cm2 of light at 365 nm for various intervals and thenallowed to sit for 30 minutes at 37° C. After quenching by the additionof 0.1 M EDTA, each timepoint was mixed with an equal volume of 2× NovexTBE-urea gel loading buffer (Thermo-Fisher) and analyzed bypolyacrylamide gel electrophoresis (15%), stained with Sybr Gold(Thermo-Fisher) and photographed with an ultraviolet transilluminator.

INCORPORATION BY REFERENCE

References and citations to other documents, such as patents, patentapplications, patent publications, journals, books, papers, webcontents, have been made throughout this disclosure. All such documentsare hereby incorporated herein by reference in their entirety for allpurposes.

EQUIVALENTS

Various modifications of the invention and many further embodimentsthereof, in addition to those shown and described herein, will becomeapparent to those skilled in the art from the full contents of thisdocument, including references to the scientific and patent literaturecited herein.

The subject matter herein contains important information,exemplification and guidance that can be adapted to the practice of thisinvention in its various embodiments and equivalents thereof.

1. A method of synthesizing a plurality of nucleic acid memory strands,the method comprising: providing an array of two or moresubstrate-linked nucleic acids with addressable delivery of activationenergy to each of the substrate-linked nucleic acids; extending one ormore of the substrate-linked nucleic acids with a homopolymer tract oftwo or more repeating nucleotides by delivering addressable activationenergy to one or more of the substrate-linked nucleic acids in thepresence of a plurality of blocked nucleotide analogs and atemplate-independent polymerase, wherein the template-independentpolymerase incorporates unblocked nucleotide analogs and not blockednucleotide analogs and wherein the addressable activation energyconverts the blocked nucleotide analogs into unblocked nucleotideanalogs.
 2. The method of claim 1, wherein the blocked nucleotide analogis converted to an unblocked nucleotide analog by removal of a blockinggroup.
 3. The method of claim 1, wherein the addressable activationenergy comprises light, reducing conditions, pH change, or heat.
 4. Themethod of claim 2, wherein the removable blocking group is at the 3′-OHof the blocked nucleotide analog.
 5. The method of claim 2, wherein theremovable blocking group is on the purine or pyrimidine base of thenucleotide analog.
 6. The method of claim 1, wherein the blockednucleotide analog comprises a removable blocking group on a 3′-OH of thedeoxyribose or ribose of a nucleotide triphosphate and a non-removablemodification on a purine or pyrimidine base of the nucleotide analog. 7.The method of claim 1, wherein the plurality of blocked nucleotideanalogs are modified nucleotides of a same nucleobase comprisingremovable 3′-O-blocking groups and two or more non-removable molecularmodifications that allow differentiation between the modified nucleotideanalogs of the same nucleobase.
 8. The method of claim 1 furthercomprising: stopping the extension; extending the homopolymer tract withan additional homopolymer tract of two or more repeating nucleotides bydelivering addressable activation energy to the homopolymer tract in thepresence of another plurality of blocked nucleotide analogs and thetemplate-independent polymerase.
 9. The method of claim 8, wherein theextension is stopped after a predetermined length of time in order toobtain a desired length for the homopolymer tract.
 10. The method ofclaim 1, wherein rate of extension is modulated by modifications to theblocked nucleotide analogs.
 11. The method of claim 10, wherein the ratemodulating modifications are removed from the homopolymer tract afterextension.
 12. The method of claim 1, wherein the rate modulatingmodifications are removed during extension.
 13. The method of claim 1,wherein the repeating nucleotides of the homopolymer tract are between 2and about
 10. 14. The method of claim 8, further comprising repeatingthe stopping and extending steps to synthesize a nucleic acid memorystrand.
 15. The method of claim 14, wherein the nucleic acid memorystrand is from about 200 nucleotides in length to about 5,000nucleotides in length.
 16. The method of claim 1, wherein apredetermined concentration of the blocked nucleotide analogs isprovided in the extending step to obtain a desired length for thehomopolymer tract.
 17. The method of claim 8, wherein the homopolymertract and additional homopolymer tract comprise different nucleobases.18. The method of claim 14 wherein the nucleic acid memory strandencodes a dataset selected from the group consisting of a text file, animage file, and an audio file.
 19. The method of claim 18, furthercomprising displaying a readable format of the dataset.
 20. The methodof claim 14, wherein a unit of data is represented in base 2.